Mac Terminal
RStudio
Github homepage
Configure Git command line for my GitHub account:
git config --global user.name "Miguel Desmarais"
git config --global user.email "migdesmarais@email.com"
Create new portfolio directory:
mkdir MICB425_portfolio
Initialize directory as new GitHub repo:
git init
Add files into github index (staging area):
git add .
Commit files to GitHub repo:
git commit -m "First commit"
Connect newly initiated GitHub repo:
git remote add origin https://github.com/migdesmarais/MICB425_portfolio.git
Verifiy web URL for new remote repo:
git remote -v
Push repo up to GitHub:
git push -u origin master
git add .
git commit -m "First commit"
git push -u origin master
The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.
http://phdcomics.com/ Comic posted 1-17-2018
The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)
hint: go to the PhD Comics website to see if you can find the image above If you can’t find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown
Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).
Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotton how to add. Never fear! A calculator R is here:
1231521+12341556280987
## [1] 1.234156e+13
Or maybe, after you’ve added those numbers, you feel like it’s about time for a table!
I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in the future.
library(knitr)
## Warning: package 'knitr' was built under R version 3.4.3
kable(summary(cars),caption="I made this table with kable in the knitr package library")
| speed | dist | |
|---|---|---|
| Min. : 4.0 | Min. : 2.00 | |
| 1st Qu.:12.0 | 1st Qu.: 26.00 | |
| Median :15.0 | Median : 36.00 | |
| Mean :15.4 | Mean : 42.98 | |
| 3rd Qu.:19.0 | 3rd Qu.: 56.00 | |
| Max. :25.0 | Max. :120.00 |
And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh? Here’s ours! Include a fun gif of your choice!
library(ggplot2)
metadata <- read.table(file="Saanich.metadata.txt", header=TRUE, row.names=1, sep="\t", na.strings="NAN")
ggplot(metadata, aes(x=NO3_uM, y=Depth_m)) +
geom_point(shape=17, size=2, colour="purple") +
scale_y_reverse() +
xlab("Nitrate (uM)") +
ylab("Depth (m)") +
ggtitle("Nitrate levels")
library(ggplot2)
library(tidyverse)
metadata <- read.table(file="Saanich.metadata.txt", header=TRUE, row.names=1, sep="\t", na.strings="NAN")
metadata %>%
mutate(Temperature_F = (Temperature_C)*(9/5)+32) %>%
ggplot() + geom_point(aes(x=Temperature_F, y=Depth_m)) +
scale_y_reverse() +
xlab("Temperature (˚F)") +
ylab("Depth (m)") +
ggtitle("Temperature")
library(ggplot2)
library(tidyverse)
library(phyloseq)
load("phyloseq_object.RData")
physeq_percent = transform_sample_counts(physeq, function(x) 100 * x/sum(x))
plot_bar(physeq_percent, fill="Domain") +
geom_bar(aes(fill=Domain), stat="identity") +
xlab("Sample depth") +
ylab("% relative abundance") +
ggtitle("Domain from 10-200 m in Saanich Inlet", subtitle = NULL)
metadatag <- metadata %>%
gather(key="variable", value="value", O2_uM, PO4_uM, SiO2_uM, NO3_uM, NH4_uM, NO2_uM)
ggplot(metadatag) +
facet_wrap(~variable, scales="free_x") +
geom_path(aes(x=value, y=Depth_m)) +
scale_y_reverse() +
xlab("µM") +
ylab("Depth (m)") +
ggtitle("Nutrient concentration")
What is an estimate of the number and total carbon content of prokaryotes on Earth?
What are the main prokaryotic habitats on Earth?
Are the numbers of prokaryotes in different habitats consistent with their turnover times?
What is the total C, N and P content of prokaryotes on Earth?
What is the cellular production rate of prokaryotes on Earth and how is it related to genetic diversity?
The primary methodological approach in this paper is to review and evaluate previous literature on the topics of prokaryotic abundance, turnover time and cellular production rate in different habitats and to extract representative numbers. These numbers were averaged, compared and used to calculate hypothetical prokaryotic abundance, turnover time and cellular production rates representative of each habitat. Investigating different research results also assessed the uncertainty of results from certain habitats.
The estimate of prokaryotic abundance on Earth is 4-6 x 1030 cells: 1.2 x 1029 cells in the open ocean, 2.6 x 1029 in soil, 3.5 x 1030 in oceanic subsurface and 0.25-2.5 x 1030 in terrestrial subsurface. The estimates were found to be consistent with previously calculated turnover times for each habitat.
The total amount of carbon content of prokaryotes on Earth is estimated to be 60-100% (350-550 Pg) of the carbon content of land plants. In addition, the prokaryotic nitrogen (85–130 Pg) and phosphorus (9–14 Pg) content is 10-fold the amount found in land plants.
Based on their estimated population size (4-6 x 1030 cells) and cellular production rate (1.7 x 1030 cells/yr), the authors argue that prokaryotes have a high capacity for genetic diversity, and that this diversity may be underestimated.
The open ocean has the highest cellular productivity and therefore this population is more likely to undergo mutations and rare genetic events.
Given their high diversity and mutation rates, should we change the way we defines prokaryotic species?
Do experiments in laboratory settings properly reflect events in aquatic or soil environments?
Given their high abundance, do subsurface prokaryotes play a major role in biogeochemical cycles on Earth?
Does the open ocean have a unique role in the planet given its important potential for prokaryotic diversity?
The methods were not explained adequately. Most of the methods were based on calculations (and numbers taken from literature), which were not always clearly outlined in the text. Since some of these methods are not outlined, the results should be used cautiously. It seemed like a lot of assumptions were made when calculating the carbon content and nitrogen/phosphorous content, and thus these results must be taken as rough estimates only. Nevertheless, the conclusions were still justified. Even if the numbers calculated were not as precise as one would want them to be, they are probably still within an acceptable range of the correct values. They show the great abundance and production rate of prokaryotes on Earth and highlight their role within habitats. The tables were easy to understand and helped to get an idea of how important prokaryotic life is on Earth.
Mouse over the data points in the timeline below to see the key events in the evolution of Earth systems.
Hadean: dry surface at 500 degrees C, 90 bar pressure on Earth’s surface; carbon dioxide tied up in carbonate materials such as limestone.
Archaean: strong greenhouse gases predominate (methane and carbon dioxide).
Proterozoic: oxidation of the early atmosphere - microaerobic.
Phanerozoic: increased oxygenation of the atmosphere, mass extinction events from meteorite impacts.
This paper aims to set planetary boundaries that should not be overstepped during the Anthropocene in order to prevent human activities from causing irreparable environmental damage/change. Question asked: What are these planetary boundaries? How can they be quantified? Have these planetary boundaries already been overstepped?
The authors have identified and quantified planetary boundaries with a “conservative, risk-averse approach”. The planetary boundaries are values selected for control variables that are at within reasonable distance from dangerous levels/thresholds. The methods used to do so are not clearly stated but are assumed to be based on previous literature. The current status of these boundaries were found from previous studies on climate change and the Anthropocene and assessed in relation to our current situation.
The authors identified 9 planetary boundaries: climate change, rate of biodiversity loss, nitrogen cycle, phosphorus cycle, stratospheric ozone depletion, ocean acidification, global freshwater use, change in land use, atmospheric aerosol loading and chemical pollution.
Out of these 9 planetary boundaries, 3 have already been overstepped: climate change, rate of biodiversity loss and nitrogen cycle.
The boundaries are tightly coupled and need to be addressed as a whole.
New approach for defining preconditions for human development: “…address the scale of human action in relation to the capacity of Earth to sustain it, …work on understanding essential Earth processes including human actions and …research into resilience and its links to complex dynamics and self-regulation of living systems.”
What direct actions can be done to limit the damage caused by our activities?
Is it too late to reset the nitrogen or phosphorus cycles or to limit damage on biodiversity?
How strong is the link between these 9 planetary boundaries and is overstepping one boundary enough to push other boundaries towards their threshold?
What feedbacks and other repercussions will come from our current actions?
Are there other potential boundaries that we are aware of today because they are not in danger?
The methods and approach used in this paper were not clearly stated. A part of the article could have be dedicated to describing the process by which these planetary boundaries and threshold were identified and set. The figures and tables were useful to get a grasp on the current state of these planetary boundaries, but still no detailed methods were shown. The article is a great summary of the planet situation in terms of its environmental condition. It is quite challenging to write about this topic since a lot of assumptions need to be made as per the inter-connectivity of systems in order to set thresholds and boundaries. I believe a more quantitative or mathematically based study would be interesting.
Aquatic - prokaryotic life in aquatic environments is found in an order of magnitude greater in the upper 200 m of the continental shelf and open ocean than in the deep ocean (>200m). Fresh water values have a similar range. These numbers, of course, depend on the latitude of habitats and also unique environmental factors. Because of animal mixing and precipitation, the upper 10 cm of sediment in the ocean is included in aquatic habitat prokaryotic numbers. They have a short turnover time and therefore a high cellular productivity, which means that aquatic habitats have a great capacity to support life (varying with depth), especially in the photic zone. Total abundance (cells): 1.18 x 1029
Soil - prokaryotes are ubiquitous to soil environments. The amount of carbon input to soil habitats is high and fuels prokaryotic growth and their role in the soil decomposition subsystem. Given their carbon content, soil habitats have a high capacity for supporting life. Total abundance (cells): 2.556 x 1029
Subsurface - subsurface environments are major habitats for prokaryotes. They are separated in marine sediments below 10 cm and terrestrial habitats below 8 m. Organic matter deposited from the surface supports most of the subsurface biomass. Most of the prokaryotes found in subsurface habitats are in the upper 600 m layer. Total abundance (cells): 3.8 x 1030
Abundance in upper 200 m of the ocean (cells): 3.6 x 1028
Density (cells/mL): 5 x 105
(Cyanobacteria: 4 x 104 cells/ml/5x105 cells x 100 = 8%)
Fraction represented by Cyanobactera including Prochlorococcus: 8%
Cyanobacteria such as Prochlorococcus produce their own energy from sunlight via photosynthesis, which in the process produces oxygen while fixing carbon. Despite only being 8% of the total prokaryotic cells in the upper 200 m of the ocean, they are responsible for approximately 50% of the oxygen in the atmosphere and contribute greatly to carbon cycling as demonstrated by their quick turnover time and resulting 8.2 x 1029 cells/year.
Autotrophs: prokaryotes that produce their own food (fix CO2 to organic matter), primarily using energy from the sun.
Heterotrophs: prokaryotes that consume organic carbon to produce energy and synthesize building blocks of life.
Lithotrophs: prokaryotes that use inorganic compounds as an energy source to produce building blocks of life.
The Mariana Trench (10,994 meters, 11 km) is the deepest part of the ocean, and it is an environment that supports prokaryotic life. Because subsurface sediments below the water layer also support prokaryotic life, the argument could be made that the deepest habitat to host prokaryotic life is the subsurface sediment layer of the trench. Subsurface environments on land may contain prokaryotes further below that of the Mariana Trench. However, not much is currently known about life existing below these depths due to challenges in retrieving uncontaminated samples from these areas.
Deepest habitat: Mariana Trench (10,994 meters, 10.9 km) + 4.5 km including subsurface sediments, depending on temperature.
Primary limiting factor: temperature. Change in temperature as getting deeper is about 22 ˚C/km.
Prokaryotes have been found in the atmosphere at altitudes as great as 57-77 km. Mount Everest (8,848 meters, 8.8 km) is the highest geographical location on Earth, and therefore would be the highest habitat capable of supporting prokaryotic life. Is it capable of supporting prokaryotic life? Primary limiting factors at this height include temperature. Some prokaryotes, psychrophiles, have adapted to such low temperatures. Nutrients are also limited at high altitude; less atoms are found in the upper atmosphere and thus less material is available to compose the building blocks of life. This would result in slower growth. UV radiation as well as pressure are limiting to life at high altitudes because they can damage cells.
Highest habitat: Mount Everest (8,848 meters, 8.8 km).
Primary limiting factor: temperature and building blocks of life
Lower limit: Mariana Trench: 10,994 meters (11 km) deep + 4.5km = 15.5 km.
Upper limit: Mount Everest: 8,848 meters (8.8 km) high, but the upper limit is much higher if it includes the atmosphere as an “habitat”.
Vertical distance of the Earth’s biosphere: 15.5 km + 8.8 km = 24.3 km (+ potential atmosphere).
Annual cellular production, in cells/year x 1029 was calculated with the following formula: Cells/year = Population Size x 365 days/turnover time (days) or Cells/year = Population Size x (turnover/year)
Viruses are very important as modulators of metabolism but also community structure, abundance, and turnover time/rates. Viral proteins can impact the metabolic potential of prokaryotes.
Carbon content along with carbon assimilation efficiency determines the upper bound limit on the turnover rates seen in the upper 200 m of the ocean. This varies with depth in the ocean, and between terrestrial and marine habitats because the abundance of carbon in each habitat is different.
(1 year x 365 days/year x 24 hours/day)/((4 x 10-7)4 mutations/cell)*(8.2 x 1029 cells/year)) = 4 simultaneous mutations every 0.4 hours
Turnover time for marine heterotrophs above 200m: 16 days
365 days/16 = 22.8 turnover/year
(3.6 x 1028 cells) x 22.8 turnover/year = 8.2 x 1029 cells regenerated/year
Mutation rate per gene per DNA replication = 4 x 10-7
4 simultaneous mutations in every gene shared by population: (4 x 10-7)4 = 2.56 x 10-26
A large mutation rate means that there is a great potential for multiple point mutations in a single replication. This allows for quick adaptation by creating a more diverse pool of mutants to be selected from. Genetic diversity will be extremely high when small-scale changes to sequence are considered. Competition and environmental pressures will mostly determine long-term “species” level biodiversity. Horizontal gene transfer can allow new genes to proliferate in a microbial community assuming the gene is successful in the organism it is “born” in.
High abundance allows for high diversity by increasing the potential for mutations and simultaneous mutations. Metabolic potential is dependent on both abundance and diversity. Diversity determines the pool of available genes to be used in metabolic pathways and abundance determines the magnitude of the effect of these pathways.
Geophysical (abiotic): tectonic and atmospheric photochemical processes, which are based on acid/base chemistry.
Biochemical (biotic): microbial processes, which are based on redox reactions.
Abiotic processes rely on acid/base chemistry (the transfer of protons without electrons) while biotic processes rely on redox reactions (the transfer of electrons and protons). Therefore, the biotic processes are mostly responsible for the transformation of energy while biotic processes are mostly responsible for matter transformation. Some biotic processes can replenish substrates that are essential for life which otherwise would be depleted as abiotic processes come to thermodynamic equilibrium.
The earth’s redox state is considered an emergent property because it depends on both geochemical processes and microbial metabolic processes, which are both dynamic and always changing.
Reversible electron transfer reactions give rise to element and nutrient cycles because of the thermodynamic conditions that make each reaction favorable. Specifically, the rate at which each reaction occurs is determined by the conditions of the environment (i.e. abundance of substrates, products, etc.), thus allowing for nutrients to cycle in a stable manner. In this manner, the synergistic cooperation of different microbial communities can lead to different environments favorable to such reversible electron transfer reactions.
The different stages of the nitrogen cycle require different groups of microbes and different levels of oxygen, and are therefore partitioned between different redox “niches”.
During nitrogen fixation, nitrogen gas from the atmosphere is fixed to ammonium. This process occurs in surface waters where atmospheric nitrogen gas is in contact with water and therefore occurs in aerobic environments. Interestingly, the enzyme nitrogenase, which catalyzes this reaction, is inhibited by oxygen. Yet, nitrogen-fixing microbes have evolved a way to do so in aerobic environments.
Since oxygen is required to oxidize ammonium, nitrification occurs in aerobic environments and is a two-stage pathway. The two steps of nitrification (ammonia to nitrite, and nitrite to nitrate) are catalyzed by two different groups of microbes: ammonia-oxidizing bacteria/archaea and other nitrifying bacteria. These oxidation reactions are couple to carbon fixation in many cases.
Denitrification, the anaerobic reduction of nitrate and nitrite to nitrogen gas, occurs in anaerobic environments and is coupled to oxidation of organic matter.
This shows how modular the nitrogen cycle is and how different redox “niches” are interconnected. In this connection, chemical species are exchanged between groups of microbes located in different redox niches and are utilized for different purposes.
Indirectly, the nitrogen cycle is connected to climate change. All microbes require nitrogen to synthesize protein and nucleic acids, and the only method of nitrogen fixation is via microorganisms. The nitrogen cycle is what controls the amount of available fixed nitrogen, which in turn affects the number of microbes carrying out various other reactions, which in turn produces the Earth’s atmosphere. The nitrogen cycle is also directly connected to climate change, as nitrous oxide can be produced during denitrification. Nitrous oxide is known as a greenhouse gas.
Although there is enormous genetic diversity in nature, there remains a relatively stable set of core genes coding for the major redox reactions essential for life and biogeochemical cycles. Thus, microbial diversity does not necessarily entail diversity in proteins involved in metabolism.
It is hypothesized that there is limitless evolutionary diversity in nature. The rate of discovery of unique protein families has been proportional to the sampling effort, with the number of new protein families increasing approximately linearly with the number of new genomes sequenced.
Microbes are considered guardianship of metabolism on a temporary and simultaneous basis. This is because of the nature of microbial evolution from horizontal and vertical gene transfer, which can change what phenotype is dominant at select time. A dominant phenotype protects the metabolic pathway in the environment. If it does not survive environmental perturbations, (applying selective pressures on pathway genes) it will disappear. Humans could possibly replicate the individual pathways, but the overall metabolic biogeochemical processes that control the flow of electrons can only be done by microbes.
We live in an epoch termed the Anthropocene in which human activity is the driving force behind climate and environmental changes. Since microbes play an undeniable role in shaping our planetary atmosphere and environment, humanity is becoming a dominant influence on microbial evolution (Gillings and Paulsen 2014). The idea of humans shaping microbial evolution, and thus a microbial Anthropocene, prompts the following question: Can humans survive without the global catalysis and environmental transformations that microbial life provides? While humans are master engineers, and thus may temporarily survive on a small-scale in the sudden event of a microbe-free world, they would certainly not be able to permanently maintain life as it is today on Earth. Such task would involve engineering complex networks mimicking microbial processes, and is nearly impossible to achieve on a global scale. Essential environmental transformations lacking on a germfree Earth include processes found in the nitrogen and the carbon/oxygen cycles, which sustain downstream processes that provide all of the oxygen we breathe and the food we consume. In addition, the microbial capacity to evolve and adapt to ever-changing environmental conditions has proven to be successful in the past and is required for the long-term survival of humans.
Since nitrogen is the main nutrient limiting life on our planet, the immediate problem with a germfree is the collapse of the nitrogen cycle (Kuypers et al. 2018). Nitrogen-fixing, nitrifying and remineralizing prokaryotes found in soil and marine environments provide bioavailable nitrogen species to plants and phytoplankton, and thus regulate most of the primary production globally. Gilbert and Neufeld (2014) argue that, in the event of a microbe-free world, environmental nitrogen pools would be depleted within one week. Photosynthesis would come to a halt and Prochlorococcus and Synechococcus species, which are responsible for the emergence of Earth’s oxygen-rich atmosphere, would disappear (Kasting 2002). Both the air we breathe and the food we consume originate from primary production, and thus would eventually be depleted. The carbon/oxygen cycle would be disrupted: atmospheric oxygen levels would plummet and carbon dioxide would slowly build up as a byproduct of animal respiration and fossil fuel use. One could argue that technological advances could allow humans to engineer such biogeochemical cycles. Since the Haber-Bosch process already accounts for 45% of the word’s total fixed nitrogen, it could realistically produce fertilizer to supply global environments with the required bioavailable nitrogen species needed for photosynthesis (Canfield et al. 2010). However, two main issues arise from the use fertilizers. For one, heterotrophic microbial processes mediate organic matter decomposition and therefore biomass would accumulate worldwide. Dr. David Lipson argues that without microbial respiration, this would result in the eventual depletion of carbon dioxide and nitrogen gas from the atmosphere, a halt of photosynthesis worldwide, and potentially, a snowball Earth (Schaechter 2006). In addition, microbes are simply too abundance and omnipresent to do so. Complete geochemical cycles would have to be synthetically engineered, implemented and monitored on a global scale. How would this be achieved with the current technology while there are still many thermodynamically feasible nitrogen-transforming reactions yet to be discovered (Kuypers et al. 2018)? The same efforts would have to be put forward for other biogeochemical processes such as the sulfur and the phosphorous cycles. While these issues could be addressed in a small-scale context, it is highly improbable that they could eventually replace the core set of genes responsible for global metabolic pathways on such a large scale.
Microbes evolved highly complex biogeochemical processes over ~4.5 billions years in response to physical and chemical changes in environmental conditions on Earth (Figure 1, Flakowski et al. 2008). In a hypothetical situation in which humans have engineered global metabolic pathways and survived on a germfree Earth, how would eventual changes in environmental conditions trigger the evolution of novel pathways? While microbes adapt new ways to react to environment cues through horizontal gene transfer and selective pressure, humans or human-engineered chemical reactors don’t (Flakowski et al. 2008). Such microbial adaptations are often complex and can involve the interaction of multiple microbial communities and feedback responses (Canfield et al. 2010). Ultimately, chemical processes could be designed to try and address arising environmental changes, but the side effects of such processes would be hard to predict.
Figure 1 Biosphere model showing the basic biogeochemical cycles involved globally. Microbial processes are represented in the middle (Falkowski et al. 2008).
Would humans be able to manage the entirety of Earth’s vital biogeochemical cycles by engineering such cycles (Fig.1)? Certainly not. Microbes are simply too abundant and omnipresent to do so. Would humans be able to engineer small-scale synthetic biospheres in which the environmental conditions are controlled by chemical reactors? Certainly. In fact, this is already being done to some extent across the globe in the “precision” food production industry where optimal growth conditions (temperature, humidity, water, light and nutrients) are created inside small-scale artificial environments (Rutherford 2017). A near future in which cities are surrounded by domes is not so eccentric and unbelievable after all. However, the idea of such life on Earth is synonym of poor life quality and biodiversity loss. Therefore, a more suitable question that arises from the idea of a germfree Earth is the following: What sacrifices are humans willing to make to sustain their current destructive activities?
While the role of microbes in global catalysis and environmental transformations has been discussed, microbes as symbionts in the gut microbiome and as a source of nutrients and cofactors in human diets have not been explored. The lack of these systems would introduce an extra layer of complexity to address the issue of a germfree Earth. Are humans smart enough to survive? We have two options to do so: a conservative or a destructive approach.
Canfield, D. E., Glazer, A. N., & Falkowski, P. G. (2010). The Evolution and Future of Earth’s Nitrogen Cycle. Science, 330(6001), 192-196. doi: 10.1126/science.1186120
Falkowski, P. G., Fenchel, T., & Delong, E. F. (2008). The Microbial Engines That Drive Earths Biogeochemical Cycles. Science, 320(5879), 1034-1039. doi: 10.1126/science.1153213
Gilbert, J. A., & Neufeld, J. D. (2014). Life in a World without Microbes. PLoS Biology, 12(12). doi: 10.1371/journal.pbio.1002020
Gillings, M. R., & Paulsen, I. T. (2014). Microbiology of the Anthropocene. Anthropocene, 5, 1-8. doi: 10.1016/j.ancene.2014.06.004
Kasting, J. F. (2002). Life and the Evolution of Earths Atmosphere. Science, 296(5570), 1066-1068. doi: 10.1126/science.1071184
Kuypers, M. M., Marchant, H. K., & Kartal, B. (2018). The microbial nitrogen-cycling network. Nature Reviews Microbiology, 16(5), 263-276. doi: 10.1038/nrmicro.2018.9
Nisbet, E. G., & Sleep, N. H. (2001). The habitat and nature of early life. Nature, 409(6823), 1083-1091. doi: 10.1038/35059210
Rockström, J., Steffen, W., Noone, K., Persson, Å, Chapin, F. S., Lambin, E. F., . . . Foley, J. A. (2009). A safe operating space for humanity. Nature, 461(7263), 472-475. doi: 10.1038/461472a
Rutherford, S. M. (2017, July 15). Artificial environments are turning the world outside in, but that’s no way to save the planet. Retrieved from https://www.independent.co.uk/environment/artificial-environments-are-turning-the-world-outside-in-but-that-s-no-way-to-save-the-planet-a7839951.html
Schaechter M (2006, December 19), Talmudic Question #4. Small Things Considered. Retrieved from: http://schaechter.asmblog.org/schaechter/2006/12/talmudic_quesit.html
Whitman, W. B., Coleman, D. C., & Wiebe, W. J. (1998). Prokaryotes: The unseen majority. Proceedings of the National Academy of Sciences, 95(12), 6578-6583. doi: 10.1073/pnas.95.12.6578
Discuss the relationship between microbial community structure and metabolic diversity
Evaluate common methods for studying the diversity of microbial communities
Recognize basic design elements in metagenomic workflows
The authors of this paper wanted to get a more detailed characterization of the genetics and biochemistry of the proteorhodopsin (PR)-based photosystem in marine picoplankton communities of the photic zone.
What genes generate a fully functional PR photosystem? Do these genes allow photophosphorylation when cells are exposed to light (are they sufficient for photophosphorylation)? Why is the PR system ubiquitous in marine prokaryotes?
Screening: The authors used a large-insert DNA fosmid library prepared from ocean surface water picoplankton and screened for PR clones on retinal-containing LB agar medium. To select for PR-containing clones, clones were selected based on whether or not they showed a orange or red phenotype (which is expected for PR-containing clones growing on this type of media).
Sequencing: The authors a collection of transposon-insertion clones to allow for a rapid DNA sequencing and localization of insertion mutants. Insertion mutant can allow for phenotypic analysis of gene functions. The authors also used a fosmid system in which the copy number can be increased.
Phenotypic analysis was preformed using selective, but also differential media.
Light-activated proton translocation was assessed by measuring light-dependent pH fluctuations. Photophosphorylation was measured using a luciferase-based assay.
Six genes are sufficient to generating a fully functional PR photosystem in E.coli and therefore in ocean surface water picoplankton. Out of these 6 genes, 5 encode for accessory photopigment biosynthetic proteins while one encode for PR. This 6-gene protein system allows for light-activated proton translocation and for photophosphorylation (ATP synthesis) in cells. Given it’s size, a single horizontal gene transfer event can lead to acquired phototrophic capabilities and could explain the ubiquitous PR system in microbial communities of the ocean surface waters.
Can this method of identifying fosmid library clones be applied to other gene sets/would other biomarkers or phenotypes can be used to select for specific clones?
Are there other genetic systems that can be easily acquired by microbial populations in the environment?
Since PR photosystems have different UV absorption minima and maxima, would the type of PR acquired by a certain microbe be based on its environment, and therefore decrease its ability to thrive in other light environments?
Would more organisms contain a PR photosystem since it is so easily transferable (more than 13% of bacteria in marine picoplankton populations?)?
The authors did not properly introduce/could have elaborated on the current methods being used in the identification and characterization of functional genes and how their method used for PR is revolutionary. The methods were described adequately but some background information such as more information on the Fosmid library construction and other techniques/approaches was missing in order to understand the experimental logic. More figures to represent the library screening or copy-controlled mechanisms could have been helpful to understand some of the methods, but most figures were helpful. The conclusions are justified, although more studies would have to be done to confirm how PR genes easily transfer between organisms and if it is an easily acquired capability.
As of 2014, 13 540 prokaryotic species with formal scientific names were included in the NCBI taxonomy database ( Federhen 2014 ). Out of these, 13 029 are part of the bacteria domain and 511 of the archaea domain. This number does not include species without formal names and has most likely increased over the past year or so. Indeed, the number of “validly named” bacterial and archaeal species is expected to greatly surpass 13 000 by the end of 2016 ( Amann and Rosselló-Móra 2016 ).
To touch more on prokaryotic “divisions”: according to a recent study by Solden et al. ( 2016 ), 89 bacterial phyla and 20 archaeal phyla had been identified by small subunit rRNA by 2016. However, as much as 1 500 bacterial phyla are thought to exist. In this context, “divisions” are considered as phylum-level lineages, which are the highest-level grouping.
According to a study by Hugenholts ( 2002 ), there is a major bias towards the “big four” bacterial phyla since they are culturable in a lab environments. Numbers show that ~63% of phylum-level lineages have cultured representatives while ~37% do not. It is widely accepted that more than 99% of prokaryotes in an environmental sample are not culturable in a lab environment. Therefore, these numbers definitely show a bias towards culturable microorganisms. Note that “non-culturable” is a subjective term since improvement in growth medium and understanding of unique prokaryotic environments could allow these organisms to grow in a lab environment. A slightly more recent study showed that out of 52 identifiable major lineages, only 26 had cultured representatives ( Rappé and Giovannoni 2003 ). With the recent advance of sequencing technology, this bias must have decreased and the numbers may show a more realistic proportion of unculturable vs. culturable prokaryotes in online databases.
As per EBI Metagenomics:
As of today, 110 217 metagenomic data sets/projects exist on the EBI Metagenomics website. Most data sets are public but a few subsets are private. These project metagenomic sequences originate from a multitude of environments: soil, human, engineered, marine, freshwater, mammals, plants, grassland, etc.
As per the DOE Join Genome Institute - IMG:
As of today, 9 877 metagenome sequencing projects are currently available in the DOE JGI public domain. These projects are sourced from air, aquatic and terrestrial environments. There are also non-environmental project sources from host-associated samples and engineering environments.
Shotgun metagenomics:
Data warehousing: ING/M, MG-RAST, NCBI. These are databases with different levels of currated datasets.
Assembly: EULER-SR
Binning: S-GCOM
Annotation: KEGG
Analysis pipelines: Megan 5
Marker gene metagenomics:
Standalone software: OTUbase
Analysis pipelines: SILVA (gold standard for rRNA identification)
Denoising: Amplicon Noise
Databases: Ribosomal Database Project (RDP, gold standard for rRNA identification)
Phylogenetic gene anchors are slow evolving marker genes containing variable regions, such as 16S rRNA/rDNA, that can be used to taxonomically identify microorganisms. For example, the 16S rRNA gene can be used in metagenome analysis to identify metabolically active prokaryotes in a select sample. In addition, only the phylogenetic gene anchor can be sequenced to provide an estimate of diversity and community composition. A phylogenetic tree can then be built based on how different these variable regions are between microorganisms ( Krause et al. 2008 ).
Functional gene anchors are similar to phylogenetic gene anchors but represent certain metabolic pathways present in certain members on the community. These markers can be used to track the metabolic potential of a community, but can also be used to built a phylogenetic tree inside a specific subgroup of a community with the same metabolic function and see how they are related ( The New Science of Metagenomics 2007 ).
Metagenomic sequence binning is the process by which sequence data is associated to its original OTU. It groups sequence reads together that come or are thought to come from the same genome. A typical contamination threshold used in this process is 5% (meaning that 5% of sequence reads inside a bin do not belong to this OTU/bin).
Types of algorithmic approaches used to produce sequence bins include composition-based binning (TETRA, GSOM,S-GSOM) and phylogenetic binning (or similarity-based binning, MEGAN, CARMA). Composition-based binning use characteristics such as GC content, k-mer content and codon usage while phylogenetic binning use reference sequences to classify sequence to its correct bin or OTU.
Composition-based binning is great since it does not require reference sequences. However, it may be problematic when OTUs are closer in identity and more numerous in a sample. Phylogenetic binning is great when sequences in the sample have similarities to reference sequences but is not adequate for identifying organisms that have never been sequenced, since they lack reference sequences. Risks associated with metagenomic binning include grouping reads that have similar characteristics but do not originate from the same cell (based on k-mer content, codon usage, etc.). Defining what a species is and setting threshold is a crucial part of this process.
Sequencing of the phylogenetic gene anchor 16S rRNA can be done to understand the metabolic activity of the microbes in the community. This method, however, only provides with the community composition and not the metabolic potential. On this note, the same type of sequencing can be done with functional gene anchors to understand what metabolic pathways are present in a sample community.
Universal primers need to be selected prior to the experiment for amplifying and sequencing these gene anchors. Risks associated with this method include the selection of inappropriate primers: if microorganisms in a sample differ greatly from the most known species, this experiment will result in many OTUs or functional gene anchors not being sequenced, and therefore not being identified. In addition, horizontal gene transfer and the different numbers gene copy in organisms can also alter results.
Other alternatives:FISH probing, functional screens (biocehmical, etc), 3rd generation sequencing ( Nanopore ), single cell sequencing (Single-cell amplified genomes, SAGs).
Amann, R., & Rosselló-Móra, R. (2016). After All, Only Millions? MBio, 7(4). doi: 10.1128/mbio.00999-16
Federhen, S. (2014). Type material in the NCBI Taxonomy Database. Nucleic Acids Research, 43(D1). doi: 10.1093/nar/gku1127
Hugenholtz, P. (2002). Exploring prokaryotic diversity in the genomic era. Genome Biol, 3(2). doi: 10.1186/gb-2002-3-2-reviews0003
Krause, L., Diaz, N. N., Goesmann, A., Kelley, S., Nattkemper, T. W., Rohwer, F., . . . Stoye, J. (2008). Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Research, 36(7), 2230-2239. doi: 10.1093/nar/gkn038
Madsen, E. L. (2005). Identifying microorganisms responsible for ecologically significant biogeochemical processes. Nature Reviews Microbiology, 3(5), 439-446. doi: 10.1038/nrmicro1151
Martinez, A., Bradley, A. S., Waldbauer, J. R., Summons, R. E., & Delong, E. F. (2007). Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. Proceedings of the National Academy of Sciences, 104(13), 5590-5595. doi: 10.1073/pnas.0611470104
Rappé, M. S., & Giovannoni, S. J. (2003). The Uncultured Microbial Majority. Annual Review of Microbiology, 57(1), 369-394. doi: 10.1146/annurev.micro.57.030502.090759
Solden, L., Lloyd, K., & Wrighton, K. (2016). The bright side of microbial dark matter: lessons learned from the uncultivated majority. Current Opinion in Microbiology, 31, 217-226. doi: 10.1016/j.mib.2016.04.020
The New Science of Metagenomics. (2007). doi: 10.17226/11902
Wooley, J. C., Godzik, A., & Friedberg, I. (2010). A Primer on Metagenomics. PLoS Computational Biology, 6(2). doi: 10.1371/journal.pcbi.1000667
Evaluate the concept of microbial species based on environmental surveys and cultivation studies.
Explain the relationship between microdiversity, genomic diversity and metabolic potential
Comment on the forces mediating divergence and cohesion in natural microbial communities
The aim of this study was to investigate and compare the genomes of three Escherichia Coli strains: uropathogenic CFT073, enterohemorrhagic EDL933 and laboratory MG1655. To compare such strains on an evolutionary level, the authors investigated their genome content (combined non-redundant set of proteins) and their pathogenicity. Questions asked: How similar or different are the genomes of these three strains? How does the disease potential vary between them? How did these pathotypes acquire some similar, but also some different genes? These questions fit in the bigger picture of understanding “the genetic bases for pathogenicity and the evolutionary diversity of E. coli”.
Primary methodological approaches include whole-genome libraries, random clone sequencing (on Applied Biosystems ABI377 and 3700 automated sequencers) and assembly of sequence data by SEQMANII. Such assembly was finalized by linking clones sequencing of opposite ends, PCR-based techniques, primer walking and whole-genome XhoI optical map. Finally, the genome of each strains was annotated using MAGPIE. Codon usage was analyzed to identify horizontal gene transfer events.
Comparison of CFT073 and MG1655 revealed that DNA segments unique to each strain are inserted in a conserved backbone of 3.92 Mb. These DNA segments are termed specific islands. The CFT073-specific island is 1.303 Mb and the MG1655-specific island is 715.7 kb. Similar specific islands were found in EDL933. Rare codon usage was observed in the island genes suggesting horizontal gene transfer as a source of these. These islands were shown to be different in content between the three strains but to be located at similar locations along the conserved backbone. These islands can be termed pathogenicity islands and confer disease potential to different strains. The authors concluded that the introduction of the high pathogenicity island was probably one of the earliest event in the evolution of extraintestinal E. coli. The ability of different strains to inhabit certain environments is based on such pathogenicity islands. In this way, CFT103 has different islands containing factors, such as the foc operon, that can contribute to its ability to infect urinary tract tissues. Such factors include genes encoding fimbrial adhesins, autotransporters, iron-sequestration systems, and phase-switch recombinases. The authors concluded that, compared to J96 and 536, the CFT073 strain must have arisen independently from clone lineages.
To what extent do other pathogens have conserved backbones and specific insertion sites within their genomes? Do these sites provide an evolutionary advantage to different pathogens?
How important is horizontal gene transfer in conferring pathogenicity and is there a mechanism by which it inserts islands at these “universal insertion targets”?
To what extent should these events be considered when taxonomically classifying microorganisms?
The methodical background was sufficient to understand the results. However, more details about the methods could be required for readers wanting to recreate the genome sequencing and assembly. The methods by which the genome was analyzed, such as codon usage, are straight forward. The tables and figures definitely helped with the concepts of a “conserved backbone”" and “universal insertion targets”.
Comment on the creative tension between gene loss, duplication and acquisition as it relates to microbial genome evolution
Identify common molecular signatures used to infer genomic identity and cohesion
Differentiate between mobile elements and different modes of gene transfer
The horizontal line on the following figure represents the conserved backbone that is shared amongst the CFT073 and EDL933 E. coli strains. The vertical lines show the size of islands not shared between the two strains. The lines are positioned at “universal insertion targets” that are common to both stains. These insertions confer the ecotype designation to both strains such that CFT073 is a uropathogenic strain and that EDL933 is an enterohemorrhagic strain. Different genes within these islands are unique to each strain and results in different pathogenicity. The definition of ecotype in the concept of the human body would be that of a distinct form of species that can inhabit particular habitats within the human body. It is therefore adapted to specific environmental conditions within the human body. The CFT073 genome contains genes that encode fimbrial adhesins, autotransporters, iron-sequestration systems, and phase-switch recombinases. Fimbrial adhesins, for example, are often associated with disease. Such adhesins could be one of the elements providing traits to CTF073, which allows it to be adapted to the uriny tract. In addition, the autotransporters often confer virulence. These genes have been acquired through horizontal gene transfer. Since horizontal gene transfer occurs between two proximal (in space) organisms, these insertions reflect environments and environmental conditions this strain may have been adapted to in the past.
Obtain a collection of “microbial” cells from “seawater”. The cells were concentrated from different depth intervals by a marine microbiologist travelling along the Line-P transect in the northeast subarctic Pacific Ocean off the coast of Vancouver Island British Columbia.
Sort out and identify different microbial “species” based on shared properties or traits. Record your data in this Rmarkdown using the example data as a guide.
Once you have defined your binning criteria, separate the cells using the sampling bags provided. These operational taxonomic units (OTUs) will be considered separate “species”. This problem set is based on content available at What is Biodiversity.
library(kableExtra)
library(knitr)
library(tidyverse)
fullcommunity = read.csv("fullcommunity.csv")
fullcommunity %>%
kable("html") %>%
kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
| number | name | characteristics | occurences |
|---|---|---|---|
| 1 | orangebear | orange gummy bear | 14 |
| 2 | pinkbear | pink gummy bear | 16 |
| 3 | yellowbear | yellow gummy bear | 16 |
| 4 | whitebear | white gummy bear | 16 |
| 5 | redbear | red gummy bear | 10 |
| 6 | green bear | green gummy bear | 18 |
| 7 | pinksolidbear | solid pink gummy bears | 1 |
| 8 | redsolidbear | solid red gummy bears | 1 |
| 9 | jellybeanyellow | yellow jelly bean | 26 |
| 10 | jellybeanorange | orange jelly bean | 31 |
| 11 | jellybeanpink | pink jelly bean | 37 |
| 12 | jellybeangreen | green jelly bean | 33 |
| 13 | jellybeanred | red jellybean | 38 |
| 14 | jellyrodsyellow | long yellow jelly rods | 3 |
| 15 | jellyrodsorange | long orange jelly rods | 1 |
| 16 | jellyrodsred | long red jelly rods | 2 |
| 17 | jellyrodsblack | long black jelly rods | 1 |
| 18 | redswirl | red swirl | 1 |
| 19 | blueswirl | blue swirl | 2 |
| 20 | ovalyellow | yellow oval | 1 |
| 21 | redfilamentous | red, long string-like | 5 |
| 22 | orangespider | orange spider-shaped | 2 |
| 23 | pinkspider | pink spider-shape | 3 |
| 24 | purplespider | purple spider-shape | 1 |
| 25 | coke-bottlered | red coke bottle | 1 |
| 26 | cokebottlepink | pink coke bottle | 1 |
| 27 | cokebottleyellow | yellow coke bottle | 1 |
| 28 | kissessilver | silver kisses | 15 |
| 29 | bigbluebrick | large blue brick | 1 |
| 30 | bigyellowbrick | large yellow brick | 1 |
| 31 | bigpinkbrick | large pink brick | 1 |
| 32 | smallbluebrick | small blue brick | 3 |
| 33 | small yellow brick | small yellow brick | 3 |
| 34 | smallgreenbrick | small green brick | 2 |
| 35 | smallpinkbrick | small pink brick | 6 |
| 36 | biggreen | large round green jelly | 5 |
| 37 | bigpurple | large round purple jelly | 3 |
| 38 | bigred | large round red jelly | 7 |
| 39 | bigorange | large round orange jelly | 5 |
| 40 | bigyellow | large round yellow jelly | 3 |
| 41 | m&morange | orange m&m | 59 |
| 42 | m&mgreen | green m&m | 30 |
| 43 | m&mred | red m&m | 27 |
| 44 | m&myellow | yellow m&m | 34 |
| 45 | m&mbrown | brown m&m | 30 |
| 46 | m&mblue | blue m&m | 60 |
| 47 | skittlesyellow | yellow skittle | 34 |
| 48 | skittlesorange | orange skittle | 39 |
| 49 | skittlered | red skittle | 30 |
| 50 | skittlebrown | brown skittle | 30 |
| 51 | skittlegreen | green skittle | 42 |
mycommunity = read.csv("mycommunity.csv")
mycommunity %>%
kable("html") %>%
kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
| number | name | characteristics | occurences |
|---|---|---|---|
| 1 | orangebear | orange gummy bear | 1 |
| 2 | pinkbear | pink gummy bear | 2 |
| 3 | yellowbear | yellow gummy bear | 5 |
| 4 | whitebear | white gummy bear | 5 |
| 5 | greenbear | green gummy bear | 4 |
| 6 | jellyrodsyellow | long yellow jelly rods | 6 |
| 7 | jellyrodsorange | long orange jelly rods | 9 |
| 8 | jellyrodsred | long red jelly rods | 5 |
| 9 | redswirl | red swirl | 1 |
| 10 | blueswirl | blue swirl | 1 |
| 11 | ovalyellow | yellow oval | 1 |
| 12 | redfilamentous | red, long string-like | 2 |
| 13 | cokebottlepink | pink coke bottle | 1 |
| 14 | kissessilver | silver kisses | 7 |
| 15 | smallbluebrick | small blue brick | 1 |
| 16 | smallyellowbrick | small yellow brick | 1 |
| 17 | smallpinkbrick | small pink brick | 1 |
| 18 | biggreen | large round green jelly | 1 |
| 19 | bigpurple | large round purple jelly | 1 |
| 20 | bigred | large round red jelly | 1 |
| 21 | bigorange | large round orange jelly | 1 |
| 22 | m&morange | orange m&m | 8 |
| 23 | m&mgreen | green m&m | 9 |
| 24 | m&mred | red m&m | 3 |
| 25 | m&myellow | yellow m&m | 10 |
| 26 | m&mbrown | brown m&m | 7 |
| 27 | m&mblue | blue m&m | 8 |
| 28 | skittlesyellow | yellow skittle | 5 |
| 29 | skittlesorange | orange skittle | 9 |
| 30 | skittlered | red skittle | 5 |
| 31 | skittlebrown | brown skittle | 1 |
| 32 | skittlegreen | green skittle | 10 |
My collection of microbial cells from seawater does not fully represent the actual diversity of microorganisms inhabiting waters along Line-P transect. Only 32 out of the 51 species were identified in my sample compared to the full community.
library(ggplot2)
collector= read.csv("collector.csv")
ggplot(collector, aes(x=x, y=y)) +
geom_point() +
geom_smooth() +
labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
The collector’s curve for my sample does not flatten out.
I can conclude from the shape of my collector’s curve that the depth of my sampling is not the greatest. Since the curve does not flatten out, more species would most likely be found if more sampling was done. The collector’s curve and this conclusion agree with the difference in the number of species found in my sample and in the full community (32 and 51, respectively).
\(\frac{1}{D}\) where \(D = \sum p_i^2\)
\(p_i\) = the fractional abundance of the \(i^{th}\) species
The higher the value is, the greater the diversity. The maximum value is the number of species in the sample, which occurs when all species contain an equal number of individuals. Because the index reflects the number of species present (richness) and the relative proportions of each species within a community (evenness), this is a diversity metric. Consider that a community can have the same number of species (equal richness) but manifest a skewed distribution in the proportion of each species (unequal evenness), which would result in different diversity values.
orangebear = 1/132
pinkbear = 2/132
yellowbear = 5/132
whitebear = 5/132
greenbear = 4/132
jellyrodsyellow = 6/132
jellyrodsorange = 9/132
jellyrodsred = 5/132
redswirl = 1/132
blueswirl = 1/132
ovalyellow = 1/132
redfilamentous = 2/132
cokebottlepink = 1/132
kissessilver = 7/132
smallbluebrick = 1/132
smallyellowbrick = 1/132
smallpinkbrick = 1/132
biggreen = 1/132
bigpurple = 1/132
bigred = 1/132
bigorange = 1/132
mnmorange = 8/132
mnmgreen = 9/132
mnmred = 3/132
mnmyellow= 10/132
mnmbrown = 7/132
mnmblue = 8/132
skittlesyellow = 5/132
skittlesorange = 9/132
skittlered = 5/132
skittlebrown = 1/132
skittlegreen = 10/132
1 / (orangebear^2 + pinkbear^2 + yellowbear^2 + whitebear^2 +
greenbear^2 + jellyrodsyellow^2 + jellyrodsorange^2 + jellyrodsred^2 + redswirl^2 + blueswirl^2 + ovalyellow^2 + redfilamentous^2 + cokebottlepink^2 + kissessilver^2 + smallbluebrick^2 + smallyellowbrick^2 + smallpinkbrick^2 + biggreen^2 + bigpurple^2 + bigred^2 + bigorange^2 + mnmorange^2 + mnmgreen^2 + mnmred^2 + mnmyellow^2 + mnmbrown^2 + mnmblue^2 + skittlesyellow^2 + skittlesorange^2 + skittlered^2 + skittlebrown^2 + skittlegreen^2)
## [1] 19.89041
Calculated on spreadsheet: 23.1744939
Another way to calculate diversity is to estimate the number of species that are present in a sample based on the empirical data to give an upper boundary of the richness of a sample. Here, we use the Chao1 richness estimator.
\(S_{chao1} = S_{obs} + (\frac{a^2}{2b})\)
\(S_{obs}\) = total number of species observed a = species observed once b = species observed twice or more
\(S_{chao1}\) =
32 + 13^2/(2*19)
## [1] 36.44737
\(S_{chao1}\) =
51 + 13^2/(2*38)
## [1] 53.22368
library(tidyverse)
library(dplyr)
library(vegan)
mycommunity_diversity =
mycommunity %>%
select(name, occurences) %>%
spread(name, occurences)
fullcommunity_diversity =
fullcommunity %>%
select(name, occurences) %>%
spread(name, occurences)
mycommunity_diversity
## biggreen bigorange bigpurple bigred blueswirl cokebottlepink greenbear
## 1 1 1 1 1 1 1 4
## jellyrodsorange jellyrodsred jellyrodsyellow kissessilver m&mblue
## 1 9 5 6 7 8
## m&mbrown m&mgreen m&morange m&mred m&myellow orangebear ovalyellow
## 1 7 9 8 3 10 1 1
## pinkbear redfilamentous redswirl skittlebrown skittlegreen skittlered
## 1 2 2 1 1 10 5
## skittlesorange skittlesyellow smallbluebrick smallpinkbrick
## 1 9 5 1 1
## smallyellowbrick whitebear yellowbear
## 1 1 5 5
fullcommunity_diversity
## bigbluebrick biggreen bigorange bigpinkbrick bigpurple bigred bigyellow
## 1 1 5 5 1 3 7 3
## bigyellowbrick blueswirl coke-bottlered cokebottlepink cokebottleyellow
## 1 1 2 1 1 1
## green bear jellybeangreen jellybeanorange jellybeanpink jellybeanred
## 1 18 33 31 37 38
## jellybeanyellow jellyrodsblack jellyrodsorange jellyrodsred
## 1 26 1 1 2
## jellyrodsyellow kissessilver m&mblue m&mbrown m&mgreen m&morange m&mred
## 1 3 15 60 30 30 59 27
## m&myellow orangebear orangespider ovalyellow pinkbear pinksolidbear
## 1 34 14 2 1 16 1
## pinkspider purplespider redbear redfilamentous redsolidbear redswirl
## 1 3 1 10 5 1 1
## skittlebrown skittlegreen skittlered skittlesorange skittlesyellow
## 1 30 42 30 39 34
## small yellow brick smallbluebrick smallgreenbrick smallpinkbrick
## 1 3 3 2 6
## whitebear yellowbear
## 1 16 16
Simpson Reciprocal Index for my sample:
diversity(mycommunity_diversity, index="invsimpson")
## [1] 19.89041
Simpson Reciprocal Index for community:
diversity(fullcommunity_diversity, index="invsimpson")
## [1] 23.17449
The Simpson Reciprocal Indices values obtained from the R function do match the value calaculated for my sample and community.
Chao1 richness estimate for my sample:
specpool(mycommunity_diversity)
## Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All 32 32 0 32 0 32 32 0 1
Chao1 richness estimate for community:
specpool(fullcommunity_diversity)
## Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All 51 51 0 51 0 51 51 0 1
The Chao1 R function values do not matched my calculated value for my sample and community. It seems like the R function uses a different way of calculating Chao1 in which the total number of species observed is the chao1 value. Therefore, my calculated Chao1 values are slightly higher for both my sample and the community.
If you are stuck on some of these final questions, reading the Kunin et al. 2010 and Lundin et al. 2012 papers may provide helpful insights.
The measure of diversity, alpha diversity in this case, directly depends on the definition of species in my sample and community. Since both the Simpson Reciprocal Indices value and the Chao1 value are essentially based, in different ways, on the total number of species observed, changing the definition of species will have an effect on the number of species observed and alpha diversity.
For example, many studies use the 97% identity threshold value to definite species. Raising this value to 99% would result in more species observed in a sample (overestimating diversity) while lowering this threshold to 95%, for example, would results in more “identical” sequences and therefore less species observed in a sample (underestimating diversity).
Binning or clustering data based on the GC content, codon usage or percent identity will most likely result in a different number of species observed in a sample since these methods are based on different sequence characteristics.
Inflated diversity have been previously associated with pyrosequencing errors. Sanger, a sequencing technology with a relatively low error rate, has shown a constant overestimation of diversity. Therefore, sequencing technologies with a higher base calling/sequencing error rate such as PacBio or Oxford Nanopore will most likely result in much more species observed in a sample. This is due to the potential introduction of wrong bases in a certain DNA sequence. This may in turn lead to the sequence falling bellow he 97% identity threshold and therefore being categorized as a different and new species observed in the sample.
Higher organisms have long been identified and classified by phenotypic traits: the simplest idea of a species is defined as a group of organisms capable of interbreeding and producing fertile offspring (King et al. 2006). This concept, however, becomes inadequate when trying to taxonomically classify asexual organisms such as prokaryotes. Microbial species have historically been named after diseases they cause or essential industrial processes they catalyze, but recent advances in sequencing technology have changed the field of taxonomy. It has allowed for the genotypic identification of species through sequence similarity methods such as DNA-DNA hybridization, rRNA gene sequence and multilocus sequence typing (MLST) (Gevers et al. 2005). Challenges associated with defining microbial species at the genotype level lie within the concepts of mosaic evolution and species ecotypes fuelled by mechanisms of horizontal gene transfer (HGT). They include setting thresholds for a shared evolutionary history, phenotypic consistency and genome content. HGT influences the spread of genetic diversity within and between species and therefore must be considered when investigating the evolution and phylogenetic distribution of microbial metabolic pathways in the environment. It is the basis behind the concept of “microbes as guardians of metabolism” (Falkowski et al. 2008), and thus the maintenance of global biogeochemical cycles through time. Such evolutionary events account for species ecotypes with highly divergent genomes and metabolic potentials, and thus prompt the following questions: What are species and is it reasonable to aim for a clear taxonomical definition?
The most accepted definition of a prokaryotic species describes individuals sharing a common evolutionary history, a certain degree of phenotypic consistency and a high degree of similarity in many independent features such as genome content (Gevers et al. 2005). While this definition is appropriate for prokaryotes with narrow phenotypes and genome content, it is not suitable for other microorganisms prone to HGT. HGT mechanisms, such as transformation, conjugation and transduction, can occur between very distantly related microorganisms and at a relatively high rate compared to higher organisms (Brigulla and Wackernagel 2010). The frequency of successful genetic material transfer between prokaryotes is based on factors that correlate with genetic relatedness such as metabolic compatibility, adaption to environmental conditions, gene expression system and gene-transfer mechanisms (Gogarten et al. 2002), but also on broad factors such as propinquity, genome size and genome G/C composition (Jain et al. 2003). Because of the above factors regulating HGT, the immediate environment in which populations are found influences their genetic material. Gene transfer across species can lead to genetic divergence between individual populations within a species such that their genomes may receive genes with different evolutionary histories, but also with unique metabolic and adaptive potentials (Gogarten et al. 2002). These HGT events are behind the concept of mosaic or modular evolution, which is defined as evolutionary changes resulting in changes in some body parts or systems without a simultaneous change in all parts (King et al. 2006). Mosaic evolution leads to the emergence of ecotypes (Figure 1), which are defined as genetically cohesive and ecologically distinct populations (Gogarten et al. 2002). HGT therefore introduces three challenges to the definition of species. For one, ecotypes can have extremely diverse metabolic potentials and ecology, and thus clearly defining the concept of microbial species at the phenotype level can be difficult. In addition, the introduction of different gene phylogenies within populations challenges the concept of a shared common evolutionary history (Banfield 2005). Finally, as seen in Welch et al. (2002), strains can share as little as 39.2% of genes within their genomes. These findings question the idea of shared genome content between prokaryotic species (Welch et al. 2002). HGT “uncouples the evolution of phenotype from the evolution of the bulk of the genome”, which increases genetic diversity and genome evolution, and complicates matters in terms of classifying prokaryotes into well-defined and distinct species (Doolittle and Papke 2006).
Figure 1 Image showing (a) the ecotype species concept, (b) the biological species concept and (c) random lineage splitting and extinction concept (Doolittle and Papker 2006).
Global biogeochemical cycles are highly modular: they consist of key processes carried out by a variety of microbes that act in synergy within the cycle. These processes have evolved and survived within microbial communities throughout evolutionary time (Madsen 2005). HGT facilitates the maintenance of such cycles by providing a selective advantage to species and core sets of genes. While Gogarten and Townsend (2005) argue that gene transfer can be neutral and can occur at high rates, selective pressure eventually results in the molecular detection of transfer events that increase the fitness of the recipient only. Some of the phenotypic traits conferred through HGT can allow newly formed ecotypes to colonize novel ecological niches and therefore be more competitive. In this way, species (or ecotypes!) can temporarily be the “guardians of metabolism” until environmental selective pressure leads to the extinction of such phenotypes (Falkowski et al. 2008). An extinction event does not necessarily lead to the lost of a given metabolic pathway since HGT allows for multiple species to simultaneously possess a certain gene. This suggests that the spatial distribution of certain biogeochemical cycles, influenced by the ecotypes temporarily possessing the core genes, may have changed over time. Thus, HGT can provide a selective advantage to the host and core genes and can spatially shape global biogeochemical cycles. HGT is a mechanism that insures robust continuity and preservation over evolutionary time of functions such as those involved in biogeochemical cycles.
Finally, clearly setting a definition of a microbial species across the scientific community is essential to allow for comparative studies of microbial environments. Doing so is not a simple task since shared evolutionary history, phenotypic consistency and genome content can greatly vary among individual species. Ultimately, the problem of defining species is a conflict between the will to create categories and the will to understand evolutionary groups (Hey 2001). Approaches such as the “ecotype model” in which both a history of co-existence and a prognosis of future existence between individual prokaryotes could serve as a taxonomical definition (Gevers et al. 2005). Humans have an intrinsic need to dissect and categorize components of their world in order to understand them. However, there is no essential reason why evolutionary processes driving microbial diversity would lead to sufficiently distinct groups of microorganisms based on genotypes that can be classified into “species”. As biologists, we must broaden the principles behind the field of population genetics to appreciate the highly biodiverse (!!!) world we live in.
Banfield, J. F. (2005). The Search for a Molecular-Level Understanding of the Processes that Underpin the Earths Biogeochemical Cycles. Reviews in Mineralogy and Geochemistry, 59(1), 1-7. doi: 10.2138/rmg.2005.59.1
Brigulla, M., & Wackernagel, W. (2010). Molecular aspects of gene transfer and foreign DNA acquisition in prokaryotes with regard to safety issues. Applied Microbiology and Biotechnology, 86(4), 1027-1041. doi: 10.1007/s00253-010-2489-3
Doolittle, W. F., & Papke, R. T. (2006). Genome Biology, 7(9), 116. doi: 10.1186/gb-2006-7-9-116
Falkowski, P. G., Fenchel, T., & Delong, E. F. (2008). The Microbial Engines That Drive Earths Biogeochemical Cycles. Science, 320(5879), 1034-1039. doi: 10.1126/science.1153213
Gevers, D., Cohan, F. M., Lawrence, J. G., Spratt, B. G., Coenye, T., Feil, E. J., . . . Swings, J. (2005). Re-evaluating prokaryotic species. Nature Reviews Microbiology, 3(9), 733-739. doi: 10.1038/nrmicro1236
Gogarten, J. P., & Townsend, J. P. (2005). Horizontal gene transfer, genome innovation and evolution. Nature Reviews Microbiology, 3(9), 679-687. doi: 10.1038/nrmicro1204
Hey, J. (2001). The mind of the species problem. Trends in Ecology & Evolution, 16(7), 326-329. doi: 10.1016/s0169-5347(01)02145-0
King R.C., Stansfield W.D. & Mulligan P.K. (2006). A dictionary of genetics. 7th ed, Oxford University Press. p286 ISBN: 0-19-530761-5
Kunin, V., Engelbrektson, A., Ochman, H., & Hugenholtz, P. (2010). Wrinkles in the rare biosphere: Pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental Microbiology, 12(1), 118-123. doi: 10.1111/j.1462-2920.2009.02051.x
Lundin, D., Severin, I., Logue, J. B., Östman, Ö, Andersson, A. F., & Lindström, E. S. (2012). Which sequencing depth is sufficient to describe patterns in bacterial α- and β-diversity? Environmental Microbiology Reports, 4(3), 367-372. doi: 10.1111/j.1758-2229.2012.00345.x
Madsen, E. L. (2005). Identifying microorganisms responsible for ecologically significant biogeochemical processes. Nature Reviews Microbiology, 3(5), 439-446. doi: 10.1038/nrmicro1151
Welch, R. A., Burland, V., Plunkett, G., Redford, P., Roesch, P., Rasko, D., . . . Blattner, F. R. (2002). Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proceedings of the National Academy of Sciences, 99(26), 17020-17024. doi: 10.1073/pnas.252529799
(n.d.). What is Biodiversity? Retrieved from https://cnx.org/contents/6FG_qx3D@1/What-is-Biodiversity-A-compari
Analysis of sequences obtained from Saanich Inlet using mothur and QIIME2 revealed peak community alpha-diversity at a depth of approximately 100 m which contains approximately 38 uM of dissolved oxygen. Lowest diversity was observed at greater depth and with lower oxygen levels. Further analysis of taxonomic levels revealed Proteobacteria as the most abundant phylum within all samples. In order to investigate how microbial communities differ across depth and oxygen gradients within the Saanich Inlet, we focused on the phylum Chloroflexi. Analysis revealed a positive correlation between depth and Chloroflexi abundance. This correlation was significant only when analysis was based on QIIME2-generated ASVs, but not mothur-generated OTUs. In addition, there exists a negative correlation between oxygen concentration and Chloroflexi abundance. Similarly, significance was only reported using QIIME2-generated ASVs. The analysis of data using QIIME2 identified four classes within the phylum Chloroflexi (Dehalococcoidia, Anaerolineae, SAR202 and JG30-KF-CM66) while mothur identified two classes within this phylum (Anaerolineae and SAR202). Changes in the abundance of OTUs and ASVs were correlated with depth and oxygen concentration. No correlation was found to be significant. The abundances of 24 (of 34) OTUs and 38 (of 47) ASVs were positively correlated with depth, while all OTUs and 46 ASVs were negatively correlated with oxygen concentration. However, the presence of seeming outliers in data analyzed by both mothur and QIIME2 may have biased the trend line and reduced model significances. Although, the absence of sufficient data limits us from classifying them as outliers. Lastly, changes in the abundances of some OTUs and ASVs were found to be significantly positively correlated with hydrogen sulfide levels. The differences observed between mothur and QIIME2 indicate the role played by the choice of pipeline to analyze results.
Saanich Inlet is a seasonally anoxic fjord [1] located between Vancouver Island and the Saanich Peninsula. It is 24 km long and has a basin of up to 234 meters in depth [2]. It has a 75-meter sill which acts to protect the deeper waters [3]. Because of this sill and the constantly high input of organic material from freshwater discharge and primary production in surface waters, its conditions below 110 meters are anoxic [3]. Oxygen is replenished dependent on the season, mostly in the fall, which modifies the oxygen gradient and thereby the environmental conditions for the microbial community that inhabit the inlet [3]. Dissolved oxygen increases gradually from a minimum concentration at higher depth up to its peak concentration at the surface due to phytoplankton metabolism and atmospheric surface waters gas exchange [3]. Nitrate reduction by denitrifiers happens mostly in the deep water following oxygenation [3]. This results in a steep nitrate gradient when looking at the different depths within the fjord [3]. A study by Zaikova et al. found that microbial diversity was highest in the hypoxic transition area and that it decreases within the anoxic basin waters [1]. It is vital to study the roles of various microorganisms within Saanich Inlet in order to understand how they affect environmental conditions like greenhouse gases, methane, and denitrogenation on a larger scale in the world’s oceans [3].
Oxygen minimum zones (OMZ) are places within the ocean where the saturation of oxygen is the lowest, typically occurring at depths of about 200 to 1000 meters [4]. These OMZs are normally found along the western boundaries of continents, which is where upwelling can bring nutrient rich water from deep within the ocean up to the surface [5]. They can also be found in coastal basins where restricted circulation can restrict mixing of deep and surface waters [5]. Not only that, but human activity can affect the size of periodic dead zones (in coastal ocean and enclosed basins) by causing runoff from agriculture to mix with marine waters - this process is called eutrophication [5]. Some important examples of this include dead zones in the Mississippi River Delta and Chesapeake Bay in America. The Saanich Inlet in British Columbia is a glacially carved fjord unlike these other zones where its entrance sill restricts oxygen rich water from entering the basin [5].
Operational Taxonomic Units (OTUs) are defined as clusters of organisms that have been grouped based on DNA sequence similarity of a specific DNA segment known as a taxonomic marker gene [6]. The grouped DNA sequences differ by less than a fixed and arbitrary sequence dissimilarity threshold, often 3% [7]. This process of clustering on a specific DNA segment, known as DNA barcoding, allows for rapid, targeted, and high throughput analysis of genetic variation in a specific genomic region such as 16s/18s rRNA sequences, leading to large scale characterization of microbial communities [6, 8]. However, new recent amplicon sequence variants (ASVs) methods have been developed with finer resolution and are independent of dissimilarity thresholds that have been used to define OTUs. ASV methods have shown higher specificity and sensitivity in comparison to OTU methods as they distinguish sequence variants as small as single nucleotides and denoise the sequences by discriminating biological sequences from errors. This is done based on the expectation that biological sequences are more abundant and more repeatedly observed than error-containing sequences [7].
Using OTU and ASV data for samples collected from the Saanich Inlet, we investigated how microbial communities differ across depth and oxygen gradients within the Saanich Inlet, with a particular focus on the phylum Chloroflexi. We found Chloroflexi of interest because its members are highly abundant in marine sediments [9] and present a broad spectrum of metabolic characteristics such as anoxygenic photosynthesis [10], obligate aerobic and anaerobic heterotrophy [11], and even predation with a gliding motility [12]. Like many other microbes, members of Chloroflexi can be a challenge to grow in culture, with some classes yet to be cultured successfully, which has made characterizing their metabolisms a challenge [13, 21]. However, new sequencing technologies have made it possible to analyze the genome of and characterize these uncultured microbes [13-21].
In our Saanich Inlet data, we were able to identify four classes within the phylum Chloroflexi: Dehalococcoidia, Anaerolineae, SAR202, and JG30-KF-CM66. Members of the class Dehalococcoidia are widely distributed throughout marine sediments [13] and anoxic deep waters [14]. Dehalococcoidia grow via anaerobic organohalide respiration and are extensively studied for their potential in the bioremediation of chloride-contaminated water and soil [13, 14]. As for the class Anaerolineae, despite its members being prevalent in various ecosystems, only a few strains have been successfully cultured [15]. Anaerolineae compose one of the core populations of anaerobic bacteria involved in anaerobic digestion and possess key genes for catalyzing cellulose hydrolysis [16].The SAR202 cluster was one of the earliest discoveries of marine bacteria which inhabited the aphotic zone [17], and since then SAR202 has been found to be ubiquitous throughout the deep ocean [18]. Members of SAR202 are involved in metabolizing organosulfur compounds and likely play a major role in sulfur cycling [19]. JG30-KF-CM66 is a relatively uncharacterized clade of acidobacteria, but it has been identified in soft coal slags [20] and anoxic ocean water [21]. The characteristics of each of these classes impact how the spatial distribution of the Chloroflexi phylum differs within Saanich Inlet. We also set out to determine if and how different sequence processing pipelines, specifically mothur and QIIME2, would impact these biological conclusions.
Water samples from 16 depths (10-200m) from cruise 72 were collected at station S3 (48°35.500 N, 123°30.300 W) onboard MSV John Strickland. Geochemical and multi-omic information, which included 16S rRNA gene amplicon sequences (V4-5 hypervariable regions) and dissolved O2, were extracted for each depth [22, 23]. Data from 7 depths (10, 100, 120, 135, 150, 165, 200m) were further analyzed. Dissolved O2 was measured onboard by the Sea-Bird SBE 43 Photosynthetically Active Radiation sensor [22]. 2L of water sample at each depth were filtered onto 0.22μm Strerivex filters, and stored until amplicon sequencing, which was carried out on the Illumina MiSeq platform at the Joint Genome Institute. Base qualities were encoded in Phred33, and primers 515F and 806R were used for 16S rRNA gene amplification [23].
Reads were independently processed using mothur and QIIME2 based pipelines, which cluster sequences based on OTUs and ASVs, respectively. Resultant data were constructed as phyloseq objects for downstream analysis in R. For a comprehensive outline and description of commands used for each pipeline, refer to the R markdown documents provided by Kim Dill-McFarland (2018; mothur pipeline) and Julia Beni (2018; QIIME2 pipeline).
Sequenced reads were first assembled into contigs, which were screened and de-duplicated so that the remainder 1) were between 200bp and 600bp long, 2) had fewer than 8 homopolymers and 3) had no ambiguous bases.
Configs were trimmed such that they would only align to bases 10368 to 25434 in the SILVA database (release 128), and uninformative bases were removed. Resultant sequences were de-duplicated again.
Sequences were then pre-clustered and clustered de novo (using 97% sequence similarity) to determine the final OTUs. Chimeric sequences were also filtered away.
Clusters were classified using the SILVA database, and resulting taxonomies were condensed.
Reads were demultiplexed and imported into QIIME2. Per-base read qualities were visualized to determine downstream trim parameters.
ASVs were generated using the Dada2 protocol using custom trim parameters for quality control. Resultant ASVs were classified using the SILVA database (release 119) using a 99% similarity threshold.
The ASV table was then converted to text format used to create a phyloseq object.
Analysis was completed in R v3.4.3 [24] using the following packages.
library(tidyverse)
library(phyloseq)
library(ggplot2)
library(dplyr)
library(stringr)
library(magrittr)
library(knitr)
library(gridExtra)
library(grid)
library(randomcoloR)
Alpha-diversities of clusters identified by mothur and QIIME2 from each sample were measured by the Shannon diversity index and the Chao1 richness estimator. Alpha-diversities were plotted against sample depth and oxygen concentration for both clustering methods, and were fitted using local polynomial regression models where appropriate. Relative abundances of all phylum level classifications produced by mothur and QIIME2 were also plotted for each sample.
Relative abundances of Chloroflexi OTUs and ASVs amongst all clusters were plotted, both as a whole and individually, across depth and oxygen gradients. Significances of correlations between these variables were based on linear regression models, as all variables are continuous and there is a lack of evidence to suggest curvilinear relationships between them.
# Generate random colours for use in figures.
palette <- distinctColorPalette(40)
Data were loaded into R and samples normalized to 100,000 sequences per sample.
load("mothur_phyloseq.RData")
load("qiime2_phyloseq.RData")
# Random seed set for reproducibility
set.seed(4831)
# Data normalized to 100,000 sequences per sample
m.norm = rarefy_even_depth(mothur, sample.size=100000)
q.norm = rarefy_even_depth(qiime2, sample.size=100000)
Relative abundance percentages were calculated for the data.
m.percent = transform_sample_counts(m.norm, function(x) 100 * x/sum(x))
q.percent = transform_sample_counts(q.norm, function(x) 100 * x/sum(x))
The phylum Chloroflexi was chosen.
phylum_name_mothur = "Chloroflexi"
phylum_name_qiime2 = "D_1__Chloroflexi"
Shannon diversity index and Chao1 were calculated for the total microbial community across depth and oxygen concentration gradients for both mothur and QIIME2.
# Alpha-diversity of total community for mothur
# Calculate Chao1 and Shannon
m.alpha = estimate_richness(m.norm, measures = c("Chao1", "Shannon"))
# Combine Chao1 and Shannon data with the rest of the geochemical data into 1 data frame
m.meta.alpha = full_join(rownames_to_column(m.alpha),
rownames_to_column(data.frame(m.percent@sam_data)), by = "rowname")
# Save plots for the different combinations (Shannon vs Chao1 across depth vs oxygen)
m.shannon.depth.plot <- m.meta.alpha %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Shannon)) +
geom_smooth(method='auto', aes(x=as.numeric(Depth_m), y=Shannon)) +
labs(title="mothur", y="Shannon diversity index", x=NULL)
m.chao1.depth.plot <- m.meta.alpha %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Chao1)) +
geom_smooth(method='auto', aes(x=as.numeric(Depth_m), y=Chao1)) +
labs(title="mothur", y="Chao1 richness estimator", x="Depth (m)")
m.shannon.o2.plot <- m.meta.alpha %>%
ggplot() +
geom_jitter(aes(x=O2_uM, y=Shannon), width = 5, shape = 1) +
labs(title="mothur", y="Shannon diversity index", x=NULL)
m.chao1.o2.plot <- m.meta.alpha %>%
ggplot() +
geom_jitter(aes(x=O2_uM, y=Chao1), width = 5, shape = 1) +
labs(title="mothur", y="Chao1 richness estimator", x="Oxygen (uM)")
# Alpha-diversity of total community for QIIME2
# Calculate Chao1 and Shannon
q.alpha = estimate_richness(q.norm, measures = c("Chao1", "Shannon"))
# Combine Chao1 and Shannon data with the rest of the geochemical data into 1 data frame
q.meta.alpha = full_join(rownames_to_column(q.alpha),
rownames_to_column(data.frame(q.percent@sam_data)), by = "rowname")
# Save plots for the different combinations (Shannon vs Chao1 across depth vs oxygen)
q.shannon.depth.plot <- q.meta.alpha %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Shannon)) +
geom_smooth(method='loess', aes(x=as.numeric(Depth_m), y=Shannon)) +
labs(title="QIIME2", y=NULL, x=NULL)
q.chao1.depth.plot <- q.meta.alpha %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Chao1)) +
geom_smooth(method='loess', aes(x=as.numeric(Depth_m), y=Chao1)) +
labs(title="QIIME2", y=NULL, x="Depth (m)")
q.shannon.o2.plot <- q.meta.alpha %>%
ggplot() +
geom_jitter(aes(x=O2_uM, y=Shannon), width = 5, shape = 1) +
labs(title="QIIME2", y=NULL, x=NULL)
q.chao1.o2.plot <- q.meta.alpha %>%
ggplot() +
geom_jitter(aes(x=O2_uM, y=Chao1), width = 5, shape = 1) +
labs(title="QIIME2", y=NULL, x="Oxygen (uM)")
# Plotting alpha-diversities versus depth
grid.arrange(m.shannon.depth.plot, q.shannon.depth.plot, m.chao1.depth.plot, q.chao1.depth.plot, ncol=2)
Figure 1 Alpha-diversity of samples collected at Saanich Inlet across depth. Sequence processing was done with both mothur and QIIME2. Points were fitted using local polynomial regression fitting. 95% confidence band is displayed in grey.
The same patterns of alpha-diversity (Shannon diversity index and the Chao1 richness estimator) can be observed across depth for both mothur and QIIME2 (Figure 1). There is a slightly lower diversity in surface waters (0m) compared to 100m depth. Peak diversity is reached at ~100-120m then diversity decreases with greater depth, with a slight increase at 200m for all but Shannon diversity index for QIIME2.
Note, however, that despite the similarity in the alpha-diversity pattern, the comparison of mothur versus QIIME2 shows difference: across all depths, mothur OTU analysis resulted in a lower alpha-diversity than the QIIME2 ASV analysis when measured with the Shannon diversity index and a higher alpha-diversity than the QIIME2 ASV analysis when measured with Chao1.
# Plotting alpha-diversities versus oxygen concentration
grid.arrange(m.shannon.o2.plot, q.shannon.o2.plot, m.chao1.o2.plot, q.chao1.o2.plot, ncol=2)
Figure 2 Alpha-diversity of samples collected at Saanich Inlet across oxygen concentration. Sequence processing was done with both mothur and QIIME2. Points are jittered to show 2 sets of 2 closely overlapping points revealed by the mothur pipeline.
Looking at Shannon diversity across oxygen concentration (Figure 2), we find that at equivalent depths QIIME2 has a greater diversity than mothur. However, the pattern exhibited by both mothur and QIIME2 data is still similar. The three lowest diversity points are at an oxygen concentration of 0 uM, while the highest diversity is found at an oxygen concentration of ~38 uM. The band of 95% confidence intervals was not plotted due to the lack of data between ~38 uM and ~217 uM of oxygen.
Comparing Chao1 at different oxygen levels for mothur and QIIME2 shows that the patterns somewhat differ. While the three lowest diversity points are still at 0 uM of oxygen, for mothur the highest diversity in terms of Chao1 is at an oxygen concentration of ~38 uM, while for QIIME2 it is at an oxygen concentration of ~32 uM. For both, oxygen concentration of ~217 uM shows a notable decrease in diversity. Chao1 exhibited a relatively greater drop at ~217 uM of oxygen compared to Shannon.
# Save plot for phylum abundance composition for mothur
m.phyla.plot = m.percent %>%
plot_bar(fill="Phylum")+
geom_bar(aes(fill=Phylum), stat="identity")+
labs(y="Abundance (%)")+
scale_fill_manual(values=palette)+
theme(legend.text=element_text(size=11))
# Save plot for phylum abundance composition for QIIME2
q.phyla.plot = q.percent %>%
plot_bar(fill="Phylum")+
geom_bar(aes(fill=Phylum), stat="identity")+
labs(y="Abundance (%)")+
scale_fill_manual(values=palette)+
theme(legend.text=element_text(size=11))
Figure 3 Relative distribution of phyla across samples from Saanich Inlet when clustered as OTUs using mothur.
Figure 4 Relative distribution of phyla across samples from Saanich Inlet when clustered as ASVs using QIIME2.
Mothur and QIIME2 identified 28 and 29 taxa, respectively, at the phylum level (Figures 3 & 4). Out of these identified phyla in both mothur and QIIME2, ~4 dominated the community composition in terms of abundance: Proteobacteria, Bacteroidetes, Thaumarchaeota and Actinobacteria (from most to less abundant). Other phyla that are noticeably more abundant include Cyanobacteria, Deferribacteres, Euryarchaeota, Firmiucutes, Gemmatimonadetes, Marinimicrobia, Nitrospinae, Planctomycetes and Verrucomicrobia. Our phylum of interest, Chloroflexi, makes up from 0 to 6% of the microbial community in the collected samples depending on depth.
# Get summary of linear model statistics for Chloroflexi abundance across depth for mothur
m.chlor.lm = m.norm %>%
subset_taxa(Phylum==phylum_name_mothur) %>%
tax_glom(taxrank = 'Phylum') %>%
psmelt() %>%
lm(Abundance ~ Depth_m, .) %>%
summary()
# Get summary of linear model statistics for Chloroflexi abundance across depth for QIIME2
q.chlor.lm = q.norm %>%
subset_taxa(Phylum==phylum_name_qiime2) %>%
tax_glom(taxrank = 'Phylum') %>%
psmelt() %>%
lm(Abundance ~ Depth_m, .) %>%
summary()
# Make a data frame for linear model statistics data for depth
taxon.abundance = data.frame("Estimate" = numeric(0), "Std. Error"= numeric(0),"t value"= numeric(0),"Pr(>|t|)"= numeric(0))
taxon.abundance <- rbind(taxon.abundance, m.chlor.lm$coefficients["Depth_m",])
taxon.abundance <- rbind(taxon.abundance, q.chlor.lm$coefficients["Depth_m",])
rownames(taxon.abundance) <- (c("mothur", "QIIME2"))
colnames(taxon.abundance) <- (c("Estimate", "Std. Error","t value","Pr(>|t|) (p-value)"))
# Make a table for the data
kable(taxon.abundance,caption="Table 1 Correlation Data of Chloroflexi Phylum across Depth")
| Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|
| mothur | 1.327529 | 0.6389862 | 2.077554 | 0.0923485 |
| QIIME2 | 2.622128 | 0.4212043 | 6.225311 | 0.0015644 |
# Filter our phylum, group by sample and summarize abundance across depth for mothur
m.abd.depth.plot <- m.percent %>%
subset_taxa(Phylum==phylum_name_mothur) %>%
psmelt() %>%
group_by(Sample) %>%
summarize(Abundance_sum=sum(Abundance), Depth_m=mean(Depth_m)) %>%
# Use the data in a plot (abundance versus depth)
ggplot() +
geom_point(aes(x=Depth_m, y=Abundance_sum)) +
geom_smooth(method='lm', aes(x=as.numeric(Depth_m), y=Abundance_sum)) +
labs(title="mothur", y="Abundance (%)", x="Depth (m)")
# Filter our phylum, group by sample and summarize abundance across depth for QIIME2
q.abd.depth.plot <- q.percent %>%
subset_taxa(Phylum==phylum_name_qiime2) %>%
psmelt() %>%
group_by(Sample) %>%
summarize(Abundance_sum=sum(Abundance), Depth_m=mean(Depth_m)) %>%
# Use the data in a plot (abundance versus depth)
ggplot() +
geom_point(aes(x=Depth_m, y=Abundance_sum)) +
geom_smooth(method='lm', aes(x=as.numeric(Depth_m), y=Abundance_sum)) +
labs(title="QIIME2", y=NULL, x="Depth (m)")
# Get summary of linear model statistics for Chloroflexi abundance across oxygen concentrations for mothur
m.chlor.lm.ox = m.norm %>%
subset_taxa(Phylum==phylum_name_mothur) %>%
tax_glom(taxrank = 'Phylum') %>%
psmelt() %>%
lm(Abundance ~ O2_uM, .) %>%
summary()
# Get summary of linear model statistics for Chloroflexi abundance across oxygen concentrations for QIIME2
q.chlor.lm.ox = q.norm %>%
subset_taxa(Phylum==phylum_name_qiime2) %>%
tax_glom(taxrank = 'Phylum') %>%
psmelt() %>%
lm(Abundance ~ O2_uM, .) %>%
summary()
# Make a data frame for linear model statistics data for oxygen concentrations
taxon.abundance.ox = data.frame("Estimate" = numeric(0), "Std. Error"= numeric(0),"t value"= numeric(0),"Pr(>|t|)"= numeric(0))
taxon.abundance.ox <- rbind(taxon.abundance.ox, m.chlor.lm.ox$coefficients["O2_uM",])
taxon.abundance.ox <- rbind(taxon.abundance.ox, q.chlor.lm.ox$coefficients["O2_uM",])
rownames(taxon.abundance.ox) <- (c("mothur", "QIIME2"))
colnames(taxon.abundance.ox) <- (c("Estimate", "Std. Error","t value","Pr(>|t|) (p-value)"))
# Make a table for the data
kable(taxon.abundance.ox,caption="Table 2 Correlation Data of Chloroflexi Phylum across Oxygen Concentration")
| Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|
| mothur | -0.750471 | 0.5865861 | -1.279387 | 0.2569128 |
| QIIME2 | -1.731996 | 0.5762708 | -3.005525 | 0.0299088 |
# Filter our phylum, group by sample and summarize abundance across oxygen concentration for mothur
m.abd.o2.plot <- m.percent %>%
subset_taxa(Phylum==phylum_name_mothur) %>%
psmelt() %>%
group_by(Sample) %>%
summarize(Abundance_sum=sum(Abundance), O2_uM=mean(O2_uM)) %>%
# Use the data in a plot (abundance versus oxygen)
ggplot() +
geom_point(aes(x=O2_uM, y=Abundance_sum)) +
geom_smooth(method='lm', aes(x=as.numeric(O2_uM), y=Abundance_sum)) +
labs(title="mothur", y="Abundance (%)", x="O2 (uM)")
# Filter our phylum, group by sample and summarize abundance across oxygen concentration for QIIME2
q.abd.o2.plot <- q.percent %>%
subset_taxa(Phylum==phylum_name_qiime2) %>%
psmelt() %>%
group_by(Sample) %>%
summarize(Abundance_sum=sum(Abundance), O2_uM=mean(O2_uM)) %>%
# Use the data in a plot (abundance versus oxygen)
ggplot() +
geom_point(aes(x=O2_uM, y=Abundance_sum)) +
geom_smooth(method='lm', aes(x=as.numeric(O2_uM), y=Abundance_sum)) +
labs(title="QIIME2", y=NULL, x="O2 (uM)")
# Plotting Chloroflexi abundance across depth
grid.arrange(m.abd.depth.plot, q.abd.depth.plot, ncol=2)
Figure 5 Relative abundance of Chloroflexi across depth for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey.
# Plotting Chloroflexi abundance across oxygen concentration
grid.arrange(m.abd.o2.plot, q.abd.o2.plot, ncol=2)
Figure 6 Relative abundance of Chloroflexi across oxygen concentration for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey.
Linear regression analysis of Chloroflexi relative abundance across depth revealed variations between mothur’s OTU and QIIME2’s ASV clustering (Figure 5). Abundance of ASV clusters revealed a significant correlation with depth (p<0.05), while OTU clusters did not (Table 1). Both correlations were found to be positive.
Similarly, linear regression analysis of Chloroflexi relative abundance across oxygen concentration (Figure 6) revealed a significant correlation of oxygen concentration with ASV clusters (p<0.05), but not with OTU clusters (Table 2). Both correlations were found to be negative.
# Get OTUs taxa
m.tax_table = data.frame(m.norm@tax_table)
# Filter out OTUs with our phylum name
m.filtered = m.tax_table %>%
rownames_to_column('OTU') %>%
filter(Phylum==phylum_name_mothur) %>%
column_to_rownames('OTU')
# Get total number of OTUs in Chloroflexi
m.rownumber = nrow(m.filtered)
# Get names of classes in OTUs
m.classes = m.filtered %>%
select('Class') %>%
unique %>%
summarise(Classes = toString(Class))
# Get ASVs taxa
q.tax_table = data.frame(q.norm@tax_table)
# Filter out ASVs with our phylum name
q.filtered = q.tax_table %>%
rownames_to_column('ASV') %>%
filter(Phylum==phylum_name_qiime2) %>%
column_to_rownames('ASV')
# Get total number of ASVs in Chloroflexi
q.rownumber = nrow(q.filtered)
# Get names of classes in ASVs
q.classes = q.filtered %>%
select('Class') %>%
unique %>%
summarise(Classes = toString(Class))
For Chloroflexi, the number of OTUs was found to be 34, and the number of ASVs was found to be 47. The OTUs represent classes: SAR202_clade, Anaerolineae, while the ASVs represent classes: D_2__JG30-KF-CM66, D_2__Anaerolineae, D_2__uncultured, D_2__Dehalococcoidia, D_2__SAR202 clade.
# Example for linear model
# Make data frame that holds linear model statistics data (OTUs depth)
otu_stats = data.frame("Estimate" = numeric(0), "Std. Error"= numeric(0),"t value"= numeric(0),"Pr(>|t|)"= numeric(0))
for (otu in row.names(m.filtered)){
linear_fit = m.norm %>%
psmelt() %>%
filter(OTU==otu) %>%
lm(Abundance ~ Depth_m, .) %>%
summary()
otu_data = linear_fit$coefficients["Depth_m",]
otu_stats <- rbind(otu_stats, otu_data)
}
colnames(otu_stats)<- (c("Estimate", "Std. Error","t value","Pr(>|t|) (p-value)"))
# Add OTU names
row.names(otu_stats) <- row.names(m.filtered)
# Add class and genus names
otu_stats = cbind(data.frame(Class = m.filtered$Class), Genus = m.filtered$Genus, otu_stats)
# Sort data frame by linear model slope (estimate)
sorted = arrange(rownames_to_column(otu_stats),Estimate)%>% column_to_rownames(var="rowname")
# Save data table in variable
lm.depth.otus = kable(sorted,caption="Table A1 Correlation data of Chloroflexi OTUs Abundance with Depth")
# Example for correlation graph
# Correlation graph for Chloroflexi OTUs percentage abundance across depth, first filtered for Chloroflexi, then plotted
m.percent %>%
subset_taxa(Phylum==phylum_name_mothur) %>%
psmelt() %>%
ggplot() +
geom_point(aes(x=Depth_m, y=Abundance)) +
geom_smooth(method='lm', aes(x=Depth_m, y=Abundance)) +
facet_wrap(~OTU, scales="free_y") +
labs(x = "Depth (m)", y = "Abundance (%)") +
theme(axis.text.x = element_text(angle = 90))
Figure 7 Relative abundances of Chloroflexi OTUs across depth for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey. None of the correlations were found to be significant. The exact p-values for each fit can be found in Table A1.
Figure 8 Relative abundances of Chloroflexi ASVs across depth for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey. None of the correlations were found to be significant. The exact p-values for each fit can be found in Table A2.
Figure 9 Relative abundances of Chloroflexi OTUs across oxygen concentration for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey. None of the correlations were found to be significant. The exact p-values for each fit can be found in Table A3.
Figure 10 Relative abundances of Chloroflexi ASVs across oxygen concentration for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey. None of the correlations were found to be significant. The exact p-values for each fit can be found in Table A4.
Linear model statistics were performed for the abundance of each OTU and ASV in relation to depth and oxygen concentration (Appendix A Table A1-A4). The linear models were subsequently plotted (Figures 7-10). No significant correlations were found between any individual OTUs/ASVs abundance and depth or oxygen concentration (p > 0.05 for all).
Although none of the correlations were significant, mothur and QIIME2 showed similar trends. For mothur ten of the 34 OTUs had negative correlation between abundance and depth (the rest positive), while for QIIME2 nine of the 47 ASVs had negative correlation between abundance and depth (the rest positive). This was while for abundance versus oxygen concentration, for mothur all OTUs had negative correlation, and for QIIME2 all but one ASVs had negative correlation.
Figure 11 Relative abundances of Chloroflexi OTUs across hydrogen sulfide concentration for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey. Significance of each fit can be found in Table A5.
Figure 12 Relative abundances of Chloroflexi ASVs across hydrogen sulfide concentration for samples collected at Saanich Inlet. Points are fitted using linear models. 95% confidence bands are displayed in grey. Significance of each fit can be found in Table A6.
Linear model statistics were performed for the abundance of each OTU and ASV in relation to hydrogen sulfide concentration (Appendix A Table A5 & A6). The linear models were subsequently plotted (Figures 11 & 12). Significant positive correlations between abundance and hydrogen sulfide concentration after p-adjusting were found for 15 and 19 individual OTUs and ASVs, respectively. For OTUs and ASVs, p-values were <0.05. Classes associated with OTUs and ASVs positively correlating with hydrogen sulfide concentration include Anaerolineae, Dehalococcoidia and the SAR202 clade.
QIIME2 identified 9 more individual members of Chloroflexi that significantly and positively correlate with hydrogen sulfide concentration compared to mothur. In addition, the significance of the results and the positive correlation with hydrogen sulfide concentration were generally greater with QIIME2 than mothur.
The Saanich Inlet provides a good model for the study of oxygen minimum zones and the microbial dynamics that shape them. The microbial diversity analyses we carried out with the two bioinformatic pipelines mothur and QIIME2 resulted in mostly similar patterns, but there were differences when it came to the details. Both pipelines found that the peak Shannon diversity index was at 100 m, however, the peak Chao1 richness estimates for the total community was at a 100 m for mothur and 120 m for QIIME2. We see the same discrepancy between mothur and QIIME2 and their Chao1 richness estimates when looking at alpha-diversity across dissolved oxygen concentrations (Figure 2). As for discrepancies between Shannon diversity and Chao1 richness, we found that at the dissolved oxygen concentration of 217 μm Chao1 was relatively lower than Shannon for both mothur and QIIME2. The lower value of Chao1 compared to Shannon at ~217uM could indicate increased species evenness despite reduced species diversity because Chao1 does not take into account evenness while Shannon does. The two pipelines also agreed that Saanich Inlet is dominated by only 4-5 phyla, with the phylum Chloroflexi making up 0-6% of the microbial community depending on the sample depth.
The phylum Chloroflexi contains bacteria with different metabolic characteristics, such as aerobic thermophiles, anoxygenic phototrophs, and anaerobic halorespirers which use halogenated organics as electron acceptors [10, 11]. In our analysis, we found that Chloroflexi abundance was positively correlated with depth (Table 1). This may be a result of oxygen concentrations decreasing with depth: a negative correlation between Chloroflexi abundance and oxygen concentrations was also found (Table 2). Interestingly, these results were determined to only be significant in the QIIME2 analysis, but not the mothur analysis. This discrepancy is discussed later and points out that the analysis method, in this case, plays a role in whether our analysis results are significant or not. The results indicate a preference for members of Chloroflexi to inhabit anoxic habitats at depth within Saanich Inlet, which is supported by previously mentioned research on this phylum [11, 16, 21]. Since Chloroflexi encompasses such vastly different classes of bacteria with disparate metabolistic behavior, it is important to note that the diverse classes within this phylum have different requirements for oxygen content, where some of them also thrive in oxic environments [11, 16, 21]. This may explain why the abundance vs. depth and oxygen concentration results with mothur were not significant (Table 1 & 2). However, we only identified a few classes within Chloroflex from our data, and each of those classes appears to be anaerobic [14-18], so the potential outliers might be a result of something else, such as other nutrients. Anoxic oceanic zones such as the deep waters of Saanich Inlet may provide an environment in which anaerobic members of Chloroflexi can be more competitive than other phyla and dominate the microbial population.
The richness within Chloroflexi highly depends on which bioinformatic pipeline is used for the analysis. Mothur was only able to identify 34 OTUs occupying 2 classes within Chloroflexi, while QIIME2 was able to identify 47 ASVs and 5 classes within Chloroflexi. However, QIIME2 was unable to identify any genera while mothur was (Table A1-3). Therefore, the required depth into the taxonomic tree may dictate which pipeline should be used in future analyses. Correlations between individual OTUs and ASVs within Chloroflexi against depth or oxygen concentration were found to be insignificant (Appendix A Table 1-4). Linear models in Figures 7-10 show similar trends between the individual OTUs and ASV with depth and oxygen concentration, however there are glaring single outlier defying any correlation between the data points. These outliers are present in all analyses but do not occur at the same depths or oxygen levels for the various ASVs and OTUs studied. This may mean that there is no pattern or rationale for their occurrences. More data would allow to either legitimize these data points or confirm them as outliers to either integrate them or exclude them from the results.
Since members of the Chloroflexi phylum have been associated with sulfur cycling, the hydrogen sulfide levels in Saanich Inlet were investigated in relation to the abundance of members of this phylum [19]. Strong positive and significant correlations were found between various members of the Chloroflexi phylum and hydrogen sulfide concentrations. Classes correlating with hydrogen sulfide levels include Anaerolineae, Dehalococcoidia and the SAR202 clade (Figure 11 & 12, Appendix A Table 5 & 6). These results are supported by previous studies that have identified members of SAR202 cluster belonging to the Chloroflexi phylum to play a major role in the sulfur cycle in the dark water column. Some of these members have pathways for sulfur reduction and could be responsible for the hydrogen sulfide concentrations at depth. As previously stated, they also have the potential to metabolize a variety of organosulfur compounds [19]. In addition, single-cell genomics studies of members of the Dehalococcoidia class within the Chloroflexi phylum highlighted their association with marine sediments and sulfur cycling [13].
Differences in bioinformatic pipelines for microbial ecology data analysis may result in potential differences in analytical outcomes, and can lead to misidentifying species in different habitats or incorrectly determining trends. As we observed in this study, although mothur and QIIME2 often produced similar patterns, they did not agree on details. While QIIME2 led us to conclude that there was a presence of four classes (plus one uncultured class) of Chloroflexi in our samples, mothur only identified two. These differences are concerning, since depending on the pipeline we use, we might miss organisms we have collected, or we might identify organisms that are in reality not there. Consequently, we can be drawing the wrong conclusions about ecosystems, and the interplay of its inhabitants. Furthermore, whether we find significant correlations can also depend on the pipeline used. While Chloroflexi abundance was found to significantly correlate with depth and oxygen concentration for QIIME2, the correlation was not significant for mothur. In fact, the significances differed by an order of magnitude. This emphasizes the concern that significance in analysis findings may rest upon the usage of different pipelines and clustering paradigms, leading to false-positives and false-negatives. This also highlights the importance of developing objective metrics to gauge the accuracy of both clustering paradigms.
Differences we observed between the usage of mothur versus QIIME2 to analyse the reads in this study can also be seen in other research. For instance, a study examining the composition of chicken cecum microbiome performed by Allalil et al. revealed lower phylogenetic diversity (PD) values when UPARSE pipelines were used in comparison to de novo QIIME pipelines and open reference QIIME pipelines [25]. However, Species Richness (S) values were comparable when comparing different pipelines. In addition, the number of assigned sequences for different sequencing platform runs were impacted because of OTU picking using different pipelines: De novo vs. open reference QIIME pipelines. The QIIME pipelines generated different relative abundance of specific genera in comparison to UPARSE. Moreover, differences in the detection profiles, such as the number of unique species, were observed when using different pipelines. The number of OTUs and taxonomic assignments produced and identified differed between pipelines with a 99% similarity threshold.
Since multiple classes were identified under the order Chloroflexi and variations in the directions of correlation with depth were observed for several clusters, future analyses may find more meaning at a sub-phylum level. Additionally, more data were obtained than were analyzed in the current report. In the future, one could look for correlations of abundance across the other factors, such as temperature or salinity, not just depth, oxygen concentration, and hydrogen sulfide. Furthermore, there is a gap in the data between the depth of 10m and depth of 100m, which makes it difficult to determine correlations. More consistent data collection with more samples and at more regular intervals could help alleviate such problems, and potentially show significant correlations. Analysis of data available from the collection of samples over time could be interesting in exploring how the diversity in the area changes over seasons, or over longer time periods, such as decades. It could also be of interest, for any unknown, or not very well known, organisms to look into more details of their genetic make up in order to determine what roles, if any, they might play in biogeochemical cycles.
[1] Zaikova E., Walsh DA, Stilwell CP, Mohn WW, Tortell PD, Hallam SJ. 2010. Microbial community dynamics in a seasonally anoxic fjord: Saanich Inlet, British Columbia. Environmental Microbiology 12:172-191.
[2] Herlinveaux RH. 2011. Journal of the Fisheries Research Board of Canada 19: 1-37.
[3] 2012. Saanich Inlet. MicrobeWiki.
[4] Oxygen Minimum Zones. Keil Lab: Aquatic Organic Geochemistry, UW Oceanography.
[5] 2014. OMZ Microbes - A SCOR working group.
[6] Blaxter M, Mann J, Chapman T, Thomas F, Whitton C, Floyd R, Abebe E. 2005. Defining operational taxonomic units using DNA barcode data. Philosophical Transactions of the Royal Society B: Biological Sciences 360: 1935–1943.
[7] Callahan BJ, Mcmurdie PJ, Holmes SP. 2017. Exact sequence variants should replace operational taxonomic units in marker gene data analysis. Multidisciplinary Journal of Microbial Ecology 11: 2639–2643.
[8] Schmidt TSB, Rodrigues JFM, Christian M. 2014. Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale. PLoS Comput Biol. 10.
[9] Wang, Y., Sheng, H., He, Y., Wu, J., Jiang, Y., Tam, N. F., & Zhou, H. 2012. Comparison of the levels of bacterial diversity in freshwater, intertidal wetland, and marine sediments by using millions of illumina tags. Applied and Environmental Microbiology 78: 8264-8271. 10.1128/AEM.01821-12
[10] Thiel V, Hamilton TL, Tomsho LP, Burhans R, Gay SE, Schuster SC, et al. 2014. Draft genome sequence of a sulfide-oxidizing, autotrophic filamentous anoxygenic phototrophic bacterium, Chloroflexus sp. strain MS-G (Chloroflexi). Genome Announc 2: 9–10.
[11] Sekiguchi Y, Yamada T, Hanada S, Ohashi A, Harada H, Kamagata Y. 2003. Anaerolinea thermophila gen. nov., sp. nov. and Caldilinea aerophila gen. nov., sp. nov., novel filamentous thermophiles that represent a previously uncultured lineage of the domain bacteria at the subphylum level. Int J Syst Evol Microbiol 53: 1843–51.
[12] Kiss H, Nett M, Domin N, Martin K, Maresca JA, Copeland A, Lapidus A, Lucas S, Berry KW, Rio TGD, Dalin E, Tice H, Pitluck S, Richardson P, Bruce D, Goodwin L, Han C, Detter JC, Schmutz J, Brettin T, Land M, Hauser L, Kyrpides NC, Ivanova N, Göker M, Woyke T, Klenk H-P, Bryant DA. 2011. Complete genome sequence of the filamentous gliding predatory bacterium Herpetosiphon aurantiacus type strain (114-95T). Standards in Genomic Sciences 5: 356–370.
[13] Wasmund K, Cooper M, Schreiber L, Lloyd KG, Baker BJ, Petersen DG, Jørgensen BB, Stepanauskas R, Reinhardt R, Schramm A, Loy A, Adrian L. 2016. Single-Cell Genome and Group-SpecificdsrABSequencing Implicate Marine Members of the ClassDehalococcoidia(PhylumChloroflexi) in Sulfur Cycling. mBio 7.
[14] Biderre-Petit C, Dugat-Bony E, Mege M, Parisot N, Adrian L, Moné A, Denonfoux J, Peyretaillade E, Debroas D, Boucher D, Peyret P. 2016. Distribution of Dehalococcoidia in the Anaerobic Deep Water of a Remote Meromictic Crater Lake and Detection of Dehalococcoidia-Derived Reductive Dehalogenase Homologous Genes. Plos One 11.
[15] Hugenholtz, P., Goebel, B. M., & Pace, N. R. 1998. Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. Journal of Bacteriology 18: 4765-4774.
[16] Xia Y, Wang Y, Wang Y, Chin FYL, Zhang T. 2016. Cellular adhesiveness and cellulolytic capacity in Anaerolineae revealed by omics-based genome interpretation. Biotechnology for Biofuels 9.
[17] Giovannoni SJ, Rappe MS, Vergin KL, Adair NL. 1996. 16S rRNA genes reveal stratified open ocean bacterioplankton populations related to the Green Non-Sulfur bacteria. Proceedings of the National Academy of Sciences 93:7979–7984.
[18] Morris RM, Rappé MS, Urbach E, Connon SA, Rappe MS, Giovannoni SJ. 2004. Prevalence of the Chloroflexi-related SAR202 bacterioplankton cluster throughout the mesopelagic zone and deep ocean. Appl Environ Microbiol 70: 2836–42.
[19] Mehrshad M, Rodriguez-Valera F, Amoozegar MA, López-García P, Ghai R. 2017. The enigmatic SAR202 cluster up close: shedding light on a globally distributed dark ocean lineage involved in sulfur cycling. The ISME Journal 12: 655–668.
[20] Wegner C-E, Liesack W. 2017. Unexpected Dominance of Elusive Acidobacteria in Early Industrial Soft Coal Slags. Frontiers in Microbiology 8.
[21] Ye Q, Wu Y, Zhu Z, Wang X, Li Z, Zhang J. 2016. Bacterial diversity in the surface sediments of the hypoxic zone near the Changjiang Estuary and in the East China Sea. MicrobiologyOpen 5: 323–339.
[22] Torres-Beltrán M, Hawley AK, Capelle D, Zaikova E, Walsh DA, Mueller A, Scofield M, Payne C, Pakhomova L, Kheirandish S, Finke J, Bhatia M, Shevchuk O, Gies EA, Fairley D, Michiels C, Suttle CA, Whitney F, Crowe SA, Tortell PD, Hallam SJ. 2017. A compendium of geochemical information from the Saanich Inlet water column. Sci Data 4: 170159.
[23] Hawley AK, Torres-Beltrán M, Zaikova E, Walsh DA, Mueller A, Scofield M, Kheirandish S, Payne C, Pakhomova L, Bhatia M, Shevchuk O, Gies EA, Fairley D, Malfatti SA, Norbeck AD, Brewer HM, Pasa-Tolic L, del Rio TG, Suttle CA, Tringe S, Hallam SJ. 2017. A compendium of multi-omic sequence information from the Saanich Inlet water column. Sci Data 4: 170160.
[24] R Core Team. 2017. R: A language and environment for statistical computing. 3.4.3. R Foundation for Statistical Computing, Vienna, Austria.
[25] Allali I, Arnold J.W., Roach J, Cadenas M.B., Butz N, Hassan H.M., Koci M, Ballou A, Mendoza M, Ali R, Azcarate-Peril M.A. 2017. A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome. BMC Microbiology 17: 1-16.
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|---|---|
| Otu0181 | SAR202_clade | SAR202_clade_ge | -0.1713584 | 0.3794814 | -0.4515595 | 0.6704985 |
| Otu1579 | SAR202_clade | SAR202_clade_ge | -0.0073322 | 0.0162544 | -0.4510917 | 0.6708136 |
| Otu1149 | SAR202_clade | SAR202_clade_ge | -0.0035352 | 0.0082586 | -0.4280621 | 0.6864177 |
| Otu4286 | SAR202_clade | SAR202_clade_ge | -0.0035352 | 0.0082586 | -0.4280621 | 0.6864177 |
| Otu1064 | SAR202_clade | SAR202_clade_ge | -0.0027496 | 0.0165362 | -0.1662767 | 0.8744539 |
| Otu2632 | SAR202_clade | SAR202_clade_ge | -0.0023568 | 0.0055057 | -0.4280621 | 0.6864177 |
| Otu4287 | SAR202_clade | SAR202_clade_ge | -0.0011784 | 0.0027529 | -0.4280621 | 0.6864177 |
| Otu2381 | Anaerolineae | uncultured | -0.0005237 | 0.0056008 | -0.0935100 | 0.9291298 |
| Otu2592 | SAR202_clade | SAR202_clade_ge | -0.0005237 | 0.0056008 | -0.0935100 | 0.9291298 |
| Otu2591 | SAR202_clade | SAR202_clade_ge | -0.0002619 | 0.0028004 | -0.0935100 | 0.9291298 |
| Otu1577 | SAR202_clade | SAR202_clade_ge | 0.0001637 | 0.0036177 | 0.0452401 | 0.9656672 |
| Otu3712 | Anaerolineae | uncultured | 0.0008511 | 0.0055928 | 0.1521723 | 0.8850009 |
| Otu3607 | Anaerolineae | uncultured | 0.0034043 | 0.0023533 | 1.4465667 | 0.2076595 |
| Otu2790 | Anaerolineae | Thermomarinilinea | 0.0034043 | 0.0023533 | 1.4465667 | 0.2076595 |
| Otu3623 | Anaerolineae | Thermomarinilinea | 0.0036007 | 0.0053694 | 0.6705821 | 0.5322101 |
| Otu4340 | Anaerolineae | Anaerolineaceae_unclassified | 0.0068085 | 0.0047067 | 1.4465667 | 0.2076595 |
| Otu2789 | Anaerolineae | Thermomarinilinea | 0.0068085 | 0.0047067 | 1.4465667 | 0.2076595 |
| Otu1558 | Anaerolineae | Anaerolineaceae_unclassified | 0.0070049 | 0.0049223 | 1.4231039 | 0.2139907 |
| Otu1863 | Anaerolineae | Pelolinea | 0.0079214 | 0.0046360 | 1.7086714 | 0.1482107 |
| Otu3589 | Anaerolineae | uncultured | 0.0136170 | 0.0094133 | 1.4465667 | 0.2076595 |
| Otu1419 | Anaerolineae | Thermomarinilinea | 0.0136170 | 0.0094133 | 1.4465667 | 0.2076595 |
| Otu2497 | Anaerolineae | Pelolinea | 0.0136170 | 0.0094133 | 1.4465667 | 0.2076595 |
| Otu1147 | Anaerolineae | Thermomarinilinea | 0.0146645 | 0.0110438 | 1.3278449 | 0.2416173 |
| Otu1983 | Anaerolineae | Thermomarinilinea | 0.0158101 | 0.0123144 | 1.2838745 | 0.2554599 |
| Otu1246 | Anaerolineae | Thermomarinilinea | 0.0158429 | 0.0092720 | 1.7086714 | 0.1482107 |
| Otu1851 | Anaerolineae | Anaerolineaceae_unclassified | 0.0170213 | 0.0117667 | 1.4465667 | 0.2076595 |
| Otu0662 | Anaerolineae | Thermomarinilinea | 0.0340426 | 0.0235333 | 1.4465667 | 0.2076595 |
| Otu0551 | Anaerolineae | Thermomarinilinea | 0.0365957 | 0.0465283 | 0.7865264 | 0.4671821 |
| Otu1028 | Anaerolineae | Thermomarinilinea | 0.0374468 | 0.0258867 | 1.4465667 | 0.2076595 |
| Otu0607 | Anaerolineae | Thermomarinilinea | 0.0389853 | 0.0438394 | 0.8892756 | 0.4145865 |
| Otu0799 | Anaerolineae | Pelolinea | 0.0477578 | 0.0280660 | 1.7016226 | 0.1495636 |
| Otu0217 | Anaerolineae | Anaerolineaceae_unclassified | 0.1946645 | 0.2102439 | 0.9258985 | 0.3969899 |
| Otu0215 | Anaerolineae | Thermomarinilinea | 0.4527660 | 0.3129935 | 1.4465667 | 0.2076595 |
| Otu0195 | Anaerolineae | Anaerolineaceae_unclassified | 0.5344681 | 0.3694735 | 1.4465667 | 0.2076595 |
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|---|---|
| Asv1886 | D_2__SAR202 clade | D_5__ | -0.3397709 | 0.2301978 | -1.4759954 | 0.1999714 |
| Asv800 | D_2__SAR202 clade | D_5__ | -0.1327332 | 0.3683890 | -0.3603073 | 0.7333378 |
| Asv1266 | D_2__SAR202 clade | D_5__ | -0.0329951 | 0.0770801 | -0.4280621 | 0.6864177 |
| Asv1289 | D_2__Anaerolineae | D_5__uncultured | -0.0164975 | 0.0385401 | -0.4280621 | 0.6864177 |
| Asv1979 | D_2__Anaerolineae | D_5__uncultured | -0.0057610 | 0.0616089 | -0.0935100 | 0.9291298 |
| Asv1144 | D_2__SAR202 clade | D_5__ | -0.0039280 | 0.1354082 | -0.0290085 | 0.9779801 |
| Asv341 | D_2__JG30-KF-CM66 | D_5__ | -0.0035352 | 0.0082586 | -0.4280621 | 0.6864177 |
| Asv1862 | D_2__Anaerolineae | D_5__uncultured | -0.0034043 | 0.0364052 | -0.0935100 | 0.9291298 |
| Asv1260 | D_2__Anaerolineae | D_5__uncultured | -0.0007856 | 0.0084012 | -0.0935100 | 0.9291298 |
| Asv2081 | D_2__Anaerolineae | D_5__uncultured | 0.0011129 | 0.0027583 | 0.4034830 | 0.7032691 |
| Asv2034 | D_2__SAR202 clade | D_5__ | 0.0038298 | 0.0251674 | 0.1521723 | 0.8850009 |
| Asv1142 | D_2__JG30-KF-CM66 | D_5__ | 0.0057610 | 0.0801057 | 0.0719180 | 0.9454553 |
| Asv1046 | D_2__Anaerolineae | D_5__uncultured | 0.0111293 | 0.0275831 | 0.4034830 | 0.7032691 |
| Asv2247 | D_2__JG30-KF-CM66 | D_5__ | 0.0126023 | 0.0187931 | 0.6705821 | 0.5322101 |
| Asv400 | D_2__uncultured | D_5__ | 0.0136170 | 0.0094133 | 1.4465667 | 0.2076595 |
| Asv496 | D_2__Dehalococcoidia | NA | 0.0136170 | 0.0094133 | 1.4465667 | 0.2076595 |
| Asv2063 | D_2__Anaerolineae | NA | 0.0189198 | 0.0468912 | 0.4034830 | 0.7032691 |
| Asv134 | D_2__Anaerolineae | NA | 0.0234043 | 0.0349014 | 0.6705821 | 0.5322101 |
| Asv1473 | D_2__SAR202 clade | NA | 0.0238298 | 0.0164733 | 1.4465667 | 0.2076595 |
| Asv1794 | D_2__Anaerolineae | D_5__uncultured | 0.0238298 | 0.0164733 | 1.4465667 | 0.2076595 |
| Asv1234 | D_2__SAR202 clade | D_5__ | 0.0272340 | 0.0188267 | 1.4465667 | 0.2076595 |
| Asv477 | D_2__Anaerolineae | D_5__uncultured | 0.0288052 | 0.0429556 | 0.6705821 | 0.5322101 |
| Asv590 | D_2__Anaerolineae | D_5__uncultured | 0.0306383 | 0.0211800 | 1.4465667 | 0.2076595 |
| Asv1003 | D_2__SAR202 clade | D_5__ | 0.0306383 | 0.0211800 | 1.4465667 | 0.2076595 |
| Asv1282 | D_2__Anaerolineae | D_5__uncultured | 0.0340426 | 0.0235333 | 1.4465667 | 0.2076595 |
| Asv490 | D_2__Anaerolineae | D_5__uncultured | 0.0396072 | 0.0859823 | 0.4606434 | 0.6643958 |
| Asv1664 | D_2__Anaerolineae | D_5__uncultured | 0.0414075 | 0.0617486 | 0.6705821 | 0.5322101 |
| Asv1939 | D_2__Anaerolineae | D_5__Longilinea | 0.0418331 | 0.0911012 | 0.4591934 | 0.6653680 |
| Asv1163 | D_2__Anaerolineae | D_5__uncultured | 0.0476596 | 0.0329467 | 1.4465667 | 0.2076595 |
| Asv473 | D_2__Anaerolineae | D_5__uncultured | 0.0522095 | 0.0778570 | 0.6705821 | 0.5322101 |
| Asv2315 | D_2__Anaerolineae | D_5__uncultured | 0.0578723 | 0.0400067 | 1.4465667 | 0.2076595 |
| Asv1693 | D_2__Anaerolineae | D_5__uncultured | 0.0583633 | 0.0676342 | 0.8629259 | 0.4276204 |
| Asv555 | D_2__Anaerolineae | D_5__uncultured | 0.0748936 | 0.0517734 | 1.4465667 | 0.2076595 |
| Asv1943 | D_2__Anaerolineae | D_5__uncultured | 0.0792144 | 0.1181278 | 0.6705821 | 0.5322101 |
| Asv428 | D_2__Anaerolineae | D_5__uncultured | 0.0955810 | 0.1216822 | 0.7854968 | 0.4677332 |
| Asv114 | D_2__JG30-KF-CM66 | D_5__ | 0.1054664 | 0.1601592 | 0.6585100 | 0.5393201 |
| Asv2324 | D_2__Anaerolineae | NA | 0.1089362 | 0.0753067 | 1.4465667 | 0.2076595 |
| Asv1423 | D_2__Anaerolineae | D_5__Longilinea | 0.1123404 | 0.0776600 | 1.4465667 | 0.2076595 |
| Asv1505 | D_2__Anaerolineae | D_5__uncultured | 0.1123404 | 0.0776600 | 1.4465667 | 0.2076595 |
| Asv271 | D_2__Anaerolineae | D_5__uncultured | 0.1361702 | 0.0941334 | 1.4465667 | 0.2076595 |
| Asv208 | D_2__Anaerolineae | D_5__uncultured | 0.1468412 | 0.1522869 | 0.9642409 | 0.3792105 |
| Asv1095 | D_2__Anaerolineae | NA | 0.1634043 | 0.1129601 | 1.4465667 | 0.2076595 |
| Asv161 | D_2__Anaerolineae | D_5__uncultured | 0.1668085 | 0.1153134 | 1.4465667 | 0.2076595 |
| Asv1108 | D_2__Anaerolineae | D_5__uncultured | 0.1669722 | 0.1917355 | 0.8708462 | 0.4236697 |
| Asv408 | D_2__Anaerolineae | D_5__uncultured | 0.1859247 | 0.2109964 | 0.8811750 | 0.4185601 |
| Asv1071 | D_2__Anaerolineae | D_5__uncultured | 0.3438298 | 0.2376868 | 1.4465667 | 0.2076595 |
| Asv1749 | D_2__Anaerolineae | D_5__uncultured | 0.5208511 | 0.3600602 | 1.4465667 | 0.2076595 |
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|---|---|
| Otu0195 | Anaerolineae | Anaerolineaceae_unclassified | -0.1897296 | 0.3302310 | -0.5745358 | 0.5904867 |
| Otu0217 | Anaerolineae | Anaerolineaceae_unclassified | -0.1736086 | 0.1582994 | -1.0967102 | 0.3227568 |
| Otu0215 | Anaerolineae | Thermomarinilinea | -0.1607263 | 0.2797499 | -0.5745358 | 0.5904867 |
| Otu0181 | SAR202_clade | SAR202_clade_ge | -0.0520091 | 0.2990620 | -0.1739074 | 0.8687601 |
| Otu0607 | Anaerolineae | Thermomarinilinea | -0.0317385 | 0.0336870 | -0.9421584 | 0.3893700 |
| Otu0551 | Anaerolineae | Thermomarinilinea | -0.0269047 | 0.0362727 | -0.7417334 | 0.4915977 |
| Otu0799 | Anaerolineae | Pelolinea | -0.0200649 | 0.0258114 | -0.7773675 | 0.4721013 |
| Otu1028 | Anaerolineae | Thermomarinilinea | -0.0132932 | 0.0231372 | -0.5745358 | 0.5904867 |
| Otu0662 | Anaerolineae | Thermomarinilinea | -0.0120847 | 0.0210338 | -0.5745358 | 0.5904867 |
| Otu1983 | Anaerolineae | Thermomarinilinea | -0.0084593 | 0.0103315 | -0.8187858 | 0.4501563 |
| Otu1147 | Anaerolineae | Thermomarinilinea | -0.0084593 | 0.0092049 | -0.9189965 | 0.4002601 |
| Otu1246 | Anaerolineae | Thermomarinilinea | -0.0072508 | 0.0084400 | -0.8590967 | 0.4295406 |
| Otu1851 | Anaerolineae | Anaerolineaceae_unclassified | -0.0060423 | 0.0105169 | -0.5745358 | 0.5904867 |
| Otu1419 | Anaerolineae | Thermomarinilinea | -0.0048339 | 0.0084135 | -0.5745358 | 0.5904867 |
| Otu2497 | Anaerolineae | Pelolinea | -0.0048339 | 0.0084135 | -0.5745358 | 0.5904867 |
| Otu3589 | Anaerolineae | uncultured | -0.0048339 | 0.0084135 | -0.5745358 | 0.5904867 |
| Otu1558 | Anaerolineae | Anaerolineaceae_unclassified | -0.0036254 | 0.0042200 | -0.8590967 | 0.4295406 |
| Otu1863 | Anaerolineae | Pelolinea | -0.0036254 | 0.0042200 | -0.8590967 | 0.4295406 |
| Otu2789 | Anaerolineae | Thermomarinilinea | -0.0024169 | 0.0042068 | -0.5745358 | 0.5904867 |
| Otu4340 | Anaerolineae | Anaerolineaceae_unclassified | -0.0024169 | 0.0042068 | -0.5745358 | 0.5904867 |
| Otu3623 | Anaerolineae | Thermomarinilinea | -0.0024169 | 0.0042068 | -0.5745358 | 0.5904867 |
| Otu1064 | SAR202_clade | SAR202_clade_ge | -0.0020728 | 0.0128145 | -0.1617557 | 0.8778315 |
| Otu1579 | SAR202_clade | SAR202_clade_ge | -0.0012945 | 0.0128349 | -0.1008584 | 0.9235824 |
| Otu3712 | Anaerolineae | uncultured | -0.0012919 | 0.0043048 | -0.3001126 | 0.7761680 |
| Otu2790 | Anaerolineae | Thermomarinilinea | -0.0012085 | 0.0021034 | -0.5745358 | 0.5904867 |
| Otu3607 | Anaerolineae | uncultured | -0.0012085 | 0.0021034 | -0.5745358 | 0.5904867 |
| Otu1577 | SAR202_clade | SAR202_clade_ge | -0.0009643 | 0.0027703 | -0.3480925 | 0.7419469 |
| Otu2592 | SAR202_clade | SAR202_clade_ge | -0.0006367 | 0.0043341 | -0.1469078 | 0.8889445 |
| Otu2381 | Anaerolineae | uncultured | -0.0006367 | 0.0043341 | -0.1469078 | 0.8889445 |
| Otu1149 | SAR202_clade | SAR202_clade_ge | -0.0004881 | 0.0065115 | -0.0749568 | 0.9431557 |
| Otu4286 | SAR202_clade | SAR202_clade_ge | -0.0004881 | 0.0065115 | -0.0749568 | 0.9431557 |
| Otu2632 | SAR202_clade | SAR202_clade_ge | -0.0003254 | 0.0043410 | -0.0749568 | 0.9431557 |
| Otu2591 | SAR202_clade | SAR202_clade_ge | -0.0003184 | 0.0021670 | -0.1469078 | 0.8889445 |
| Otu4287 | SAR202_clade | SAR202_clade_ge | -0.0001627 | 0.0021705 | -0.0749568 | 0.9431557 |
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | |
|---|---|---|---|---|---|---|
| Asv1749 | D_2__Anaerolineae | D_5__uncultured | -0.1848957 | 0.3218175 | -0.5745358 | 0.5904867 |
| Asv408 | D_2__Anaerolineae | D_5__uncultured | -0.1764060 | 0.1570152 | -1.1234959 | 0.3122557 |
| Asv1108 | D_2__Anaerolineae | D_5__uncultured | -0.1646951 | 0.1413959 | -1.1647802 | 0.2966535 |
| Asv208 | D_2__Anaerolineae | D_5__uncultured | -0.1439410 | 0.1112113 | -1.2943021 | 0.2521126 |
| Asv1071 | D_2__Anaerolineae | D_5__uncultured | -0.1220553 | 0.2124416 | -0.5745358 | 0.5904867 |
| Asv428 | D_2__Anaerolineae | D_5__uncultured | -0.1061416 | 0.0879361 | -1.2070308 | 0.2814027 |
| Asv114 | D_2__JG30-KF-CM66 | D_5__ | -0.0969221 | 0.1218861 | -0.7951862 | 0.4625657 |
| Asv800 | D_2__SAR202 clade | D_5__ | -0.0875482 | 0.2864534 | -0.3056281 | 0.7722028 |
| Asv161 | D_2__Anaerolineae | D_5__uncultured | -0.0592150 | 0.1030657 | -0.5745358 | 0.5904867 |
| Asv1095 | D_2__Anaerolineae | NA | -0.0580065 | 0.1009624 | -0.5745358 | 0.5904867 |
| Asv1943 | D_2__Anaerolineae | D_5__uncultured | -0.0531726 | 0.0925488 | -0.5745358 | 0.5904867 |
| Asv271 | D_2__Anaerolineae | D_5__uncultured | -0.0483387 | 0.0841353 | -0.5745358 | 0.5904867 |
| Asv1939 | D_2__Anaerolineae | D_5__Longilinea | -0.0476310 | 0.0688397 | -0.6919125 | 0.5198017 |
| Asv490 | D_2__Anaerolineae | D_5__uncultured | -0.0452141 | 0.0649448 | -0.6961928 | 0.5173357 |
| Asv1693 | D_2__Anaerolineae | D_5__uncultured | -0.0447133 | 0.0524914 | -0.8518223 | 0.4332067 |
| Asv1505 | D_2__Anaerolineae | D_5__uncultured | -0.0398795 | 0.0694116 | -0.5745358 | 0.5904867 |
| Asv1423 | D_2__Anaerolineae | D_5__Longilinea | -0.0398795 | 0.0694116 | -0.5745358 | 0.5904867 |
| Asv2324 | D_2__Anaerolineae | NA | -0.0386710 | 0.0673082 | -0.5745358 | 0.5904867 |
| Asv473 | D_2__Anaerolineae | D_5__uncultured | -0.0350456 | 0.0609981 | -0.5745358 | 0.5904867 |
| Asv1664 | D_2__Anaerolineae | D_5__uncultured | -0.0277948 | 0.0483778 | -0.5745358 | 0.5904867 |
| Asv555 | D_2__Anaerolineae | D_5__uncultured | -0.0265863 | 0.0462744 | -0.5745358 | 0.5904867 |
| Asv1144 | D_2__SAR202 clade | D_5__ | -0.0252671 | 0.1043155 | -0.2422180 | 0.8182327 |
| Asv2063 | D_2__Anaerolineae | NA | -0.0205440 | 0.0357575 | -0.5745358 | 0.5904867 |
| Asv2315 | D_2__Anaerolineae | D_5__uncultured | -0.0205440 | 0.0357575 | -0.5745358 | 0.5904867 |
| Asv477 | D_2__Anaerolineae | D_5__uncultured | -0.0193355 | 0.0336541 | -0.5745358 | 0.5904867 |
| Asv1163 | D_2__Anaerolineae | D_5__uncultured | -0.0169186 | 0.0294474 | -0.5745358 | 0.5904867 |
| Asv134 | D_2__Anaerolineae | NA | -0.0157101 | 0.0273440 | -0.5745358 | 0.5904867 |
| Asv1282 | D_2__Anaerolineae | D_5__uncultured | -0.0120847 | 0.0210338 | -0.5745358 | 0.5904867 |
| Asv1046 | D_2__Anaerolineae | D_5__uncultured | -0.0120847 | 0.0210338 | -0.5745358 | 0.5904867 |
| Asv1142 | D_2__JG30-KF-CM66 | D_5__ | -0.0112835 | 0.0618942 | -0.1823031 | 0.8625058 |
| Asv590 | D_2__Anaerolineae | D_5__uncultured | -0.0108762 | 0.0189304 | -0.5745358 | 0.5904867 |
| Asv1003 | D_2__SAR202 clade | D_5__ | -0.0108762 | 0.0189304 | -0.5745358 | 0.5904867 |
| Asv1234 | D_2__SAR202 clade | D_5__ | -0.0096677 | 0.0168271 | -0.5745358 | 0.5904867 |
| Asv1794 | D_2__Anaerolineae | D_5__uncultured | -0.0084593 | 0.0147237 | -0.5745358 | 0.5904867 |
| Asv1473 | D_2__SAR202 clade | NA | -0.0084593 | 0.0147237 | -0.5745358 | 0.5904867 |
| Asv2247 | D_2__JG30-KF-CM66 | D_5__ | -0.0084593 | 0.0147237 | -0.5745358 | 0.5904867 |
| Asv1979 | D_2__Anaerolineae | D_5__uncultured | -0.0070038 | 0.0476747 | -0.1469078 | 0.8889445 |
| Asv2034 | D_2__SAR202 clade | D_5__ | -0.0058137 | 0.0193716 | -0.3001126 | 0.7761680 |
| Asv496 | D_2__Dehalococcoidia | NA | -0.0048339 | 0.0084135 | -0.5745358 | 0.5904867 |
| Asv400 | D_2__uncultured | D_5__ | -0.0048339 | 0.0084135 | -0.5745358 | 0.5904867 |
| Asv1266 | D_2__SAR202 clade | D_5__ | -0.0045554 | 0.0607736 | -0.0749568 | 0.9431557 |
| Asv1862 | D_2__Anaerolineae | D_5__uncultured | -0.0041386 | 0.0281714 | -0.1469078 | 0.8889445 |
| Asv1289 | D_2__Anaerolineae | D_5__uncultured | -0.0022777 | 0.0303868 | -0.0749568 | 0.9431557 |
| Asv2081 | D_2__Anaerolineae | D_5__uncultured | -0.0012085 | 0.0021034 | -0.5745358 | 0.5904867 |
| Asv1260 | D_2__Anaerolineae | D_5__uncultured | -0.0009551 | 0.0065011 | -0.1469078 | 0.8889445 |
| Asv341 | D_2__JG30-KF-CM66 | D_5__ | -0.0004881 | 0.0065115 | -0.0749568 | 0.9431557 |
| Asv1886 | D_2__SAR202 clade | D_5__ | 0.1614351 | 0.2011515 | 0.8025547 | 0.4586642 |
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | FDR Adjusted p-value | |
|---|---|---|---|---|---|---|---|
| Otu0181 | SAR202_clade | SAR202_clade_ge | -2.4575728 | 3.3035421 | -0.7439205 | 0.4903847 | 0.7285229 |
| Otu0217 | Anaerolineae | Anaerolineaceae_unclassified | -0.9242423 | 2.0042270 | -0.4611465 | 0.6640586 | 0.7285229 |
| Otu0607 | Anaerolineae | Thermomarinilinea | -0.1124721 | 0.4212890 | -0.2669714 | 0.8001524 | 0.8501620 |
| Otu1064 | SAR202_clade | SAR202_clade_ge | -0.0796436 | 0.1448049 | -0.5500061 | 0.6059809 | 0.7285229 |
| Otu1579 | SAR202_clade | SAR202_clade_ge | -0.0796436 | 0.1448049 | -0.5500061 | 0.6059809 | 0.7285229 |
| Otu1149 | SAR202_clade | SAR202_clade_ge | -0.0341330 | 0.0740614 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu4286 | SAR202_clade | SAR202_clade_ge | -0.0341330 | 0.0740614 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu0551 | Anaerolineae | Thermomarinilinea | -0.0280166 | 0.4433827 | -0.0631883 | 0.9520649 | 0.9520649 |
| Otu2381 | Anaerolineae | uncultured | -0.0227553 | 0.0493743 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu2592 | SAR202_clade | SAR202_clade_ge | -0.0227553 | 0.0493743 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu2632 | SAR202_clade | SAR202_clade_ge | -0.0227553 | 0.0493743 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu3712 | Anaerolineae | uncultured | -0.0227553 | 0.0493743 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu1577 | SAR202_clade | SAR202_clade_ge | -0.0227553 | 0.0309087 | -0.7362104 | 0.4946701 | 0.7285229 |
| Otu2591 | SAR202_clade | SAR202_clade_ge | -0.0113777 | 0.0246871 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu4287 | SAR202_clade | SAR202_clade_ge | -0.0113777 | 0.0246871 | -0.4608737 | 0.6642415 | 0.7285229 |
| Otu3623 | Anaerolineae | Thermomarinilinea | 0.0032080 | 0.0503917 | 0.0636606 | 0.9517072 | 0.9520649 |
| Otu3607 | Anaerolineae | uncultured | 0.0552843 | 0.0049066 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu2790 | Anaerolineae | Thermomarinilinea | 0.0552843 | 0.0049066 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu1558 | Anaerolineae | Anaerolineaceae_unclassified | 0.0584922 | 0.0454851 | 1.2859653 | 0.2547855 | 0.5414192 |
| Otu1863 | Anaerolineae | Pelolinea | 0.0991909 | 0.0280249 | 3.5393861 | 0.0165734 | 0.0375663 |
| Otu2789 | Anaerolineae | Thermomarinilinea | 0.1105686 | 0.0098132 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu4340 | Anaerolineae | Anaerolineaceae_unclassified | 0.1105686 | 0.0098132 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu1983 | Anaerolineae | Thermomarinilinea | 0.1185885 | 0.1161660 | 1.0208533 | 0.3541507 | 0.6689512 |
| Otu1147 | Anaerolineae | Thermomarinilinea | 0.1203422 | 0.1022047 | 1.1774631 | 0.2920002 | 0.5840003 |
| Otu1246 | Anaerolineae | Thermomarinilinea | 0.1983818 | 0.0560498 | 3.5393861 | 0.0165734 | 0.0375663 |
| Otu3589 | Anaerolineae | uncultured | 0.2211371 | 0.0196264 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu1419 | Anaerolineae | Thermomarinilinea | 0.2211371 | 0.0196264 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu2497 | Anaerolineae | Pelolinea | 0.2211371 | 0.0196264 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu1851 | Anaerolineae | Anaerolineaceae_unclassified | 0.2764214 | 0.0245330 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu0662 | Anaerolineae | Thermomarinilinea | 0.5528428 | 0.0490660 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu1028 | Anaerolineae | Thermomarinilinea | 0.6081271 | 0.0539726 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu0799 | Anaerolineae | Pelolinea | 0.6618074 | 0.1140109 | 5.8047714 | 0.0021396 | 0.0055959 |
| Otu0215 | Anaerolineae | Thermomarinilinea | 7.3528091 | 0.6525773 | 11.2673382 | 0.0000962 | 0.0002726 |
| Otu0195 | Anaerolineae | Anaerolineaceae_unclassified | 8.6796317 | 0.7703356 | 11.2673382 | 0.0000962 | 0.0002726 |
| Class | Genus | Estimate | Std. Error | t value | Pr(>|t|) (p-value) | FDR Adjusted p-value | |
|---|---|---|---|---|---|---|---|
| Asv800 | D_2__SAR202 clade | D_5__ | -2.9695672 | 3.0816822 | -0.9636189 | 0.3794937 | 0.8004961 |
| Asv1886 | D_2__SAR202 clade | D_5__ | -2.2758300 | 2.2620814 | -1.0060778 | 0.3605552 | 0.8004961 |
| Asv1108 | D_2__Anaerolineae | D_5__uncultured | -1.2574021 | 1.7629163 | -0.7132512 | 0.5075876 | 0.8004961 |
| Asv408 | D_2__Anaerolineae | D_5__uncultured | -1.1503411 | 1.9735619 | -0.5828756 | 0.5852742 | 0.8004961 |
| Asv208 | D_2__Anaerolineae | D_5__uncultured | -1.1127006 | 1.4059595 | -0.7914172 | 0.4645707 | 0.8004961 |
| Asv428 | D_2__Anaerolineae | D_5__uncultured | -1.0569670 | 1.0591512 | -0.9979378 | 0.3641245 | 0.8004961 |
| Asv1144 | D_2__SAR202 clade | D_5__ | -0.6485262 | 1.1827890 | -0.5483025 | 0.6070659 | 0.8004961 |
| Asv114 | D_2__JG30-KF-CM66 | D_5__ | -0.6042138 | 1.4769566 | -0.4090938 | 0.6994049 | 0.8218007 |
| Asv1939 | D_2__Anaerolineae | D_5__Longilinea | -0.5119943 | 0.8044173 | -0.6364785 | 0.5524571 | 0.8004961 |
| Asv490 | D_2__Anaerolineae | D_5__uncultured | -0.4892390 | 0.7585529 | -0.6449637 | 0.5473730 | 0.8004961 |
| Asv1142 | D_2__JG30-KF-CM66 | D_5__ | -0.3868402 | 0.6996938 | -0.5528707 | 0.6041591 | 0.8004961 |
| Asv1266 | D_2__SAR202 clade | D_5__ | -0.3185743 | 0.6912399 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv1979 | D_2__Anaerolineae | D_5__uncultured | -0.2503083 | 0.5431170 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv2063 | D_2__Anaerolineae | NA | -0.1934201 | 0.4196813 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv1289 | D_2__Anaerolineae | D_5__uncultured | -0.1592871 | 0.3456199 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv1862 | D_2__Anaerolineae | D_5__uncultured | -0.1479095 | 0.3209328 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv1046 | D_2__Anaerolineae | D_5__uncultured | -0.1137765 | 0.2468714 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv2034 | D_2__SAR202 clade | D_5__ | -0.1023989 | 0.2221842 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv1693 | D_2__Anaerolineae | D_5__uncultured | -0.0964323 | 0.6505274 | -0.1482371 | 0.8879484 | 0.9517072 |
| Asv1260 | D_2__Anaerolineae | D_5__uncultured | -0.0341330 | 0.0740614 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv341 | D_2__JG30-KF-CM66 | D_5__ | -0.0341330 | 0.0740614 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv2081 | D_2__Anaerolineae | D_5__uncultured | -0.0113777 | 0.0246871 | -0.4608737 | 0.6642415 | 0.8004961 |
| Asv2247 | D_2__JG30-KF-CM66 | D_5__ | 0.0112279 | 0.1763709 | 0.0636606 | 0.9517072 | 0.9517072 |
| Asv134 | D_2__Anaerolineae | NA | 0.0208518 | 0.3275459 | 0.0636606 | 0.9517072 | 0.9517072 |
| Asv477 | D_2__Anaerolineae | D_5__uncultured | 0.0256637 | 0.4031335 | 0.0636606 | 0.9517072 | 0.9517072 |
| Asv1664 | D_2__Anaerolineae | D_5__uncultured | 0.0368916 | 0.5795044 | 0.0636606 | 0.9517072 | 0.9517072 |
| Asv473 | D_2__Anaerolineae | D_5__uncultured | 0.0465155 | 0.7306794 | 0.0636606 | 0.9517072 | 0.9517072 |
| Asv1943 | D_2__Anaerolineae | D_5__uncultured | 0.0705752 | 1.1086170 | 0.0636606 | 0.9517072 | 0.9517072 |
| Asv400 | D_2__uncultured | D_5__ | 0.2211371 | 0.0196264 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv496 | D_2__Dehalococcoidia | NA | 0.2211371 | 0.0196264 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1473 | D_2__SAR202 clade | NA | 0.3869900 | 0.0343462 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1794 | D_2__Anaerolineae | D_5__uncultured | 0.3869900 | 0.0343462 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1234 | D_2__SAR202 clade | D_5__ | 0.4422742 | 0.0392528 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv590 | D_2__Anaerolineae | D_5__uncultured | 0.4975585 | 0.0441594 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1003 | D_2__SAR202 clade | D_5__ | 0.4975585 | 0.0441594 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1282 | D_2__Anaerolineae | D_5__uncultured | 0.5528428 | 0.0490660 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1163 | D_2__Anaerolineae | D_5__uncultured | 0.7739799 | 0.0686923 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv2315 | D_2__Anaerolineae | D_5__uncultured | 0.9398327 | 0.0834121 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv555 | D_2__Anaerolineae | D_5__uncultured | 1.2162541 | 0.1079451 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv2324 | D_2__Anaerolineae | NA | 1.7690969 | 0.1570111 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1423 | D_2__Anaerolineae | D_5__Longilinea | 1.8243812 | 0.1619177 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1505 | D_2__Anaerolineae | D_5__uncultured | 1.8243812 | 0.1619177 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv271 | D_2__Anaerolineae | D_5__uncultured | 2.2113711 | 0.1962638 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1095 | D_2__Anaerolineae | NA | 2.6536454 | 0.2355166 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv161 | D_2__Anaerolineae | D_5__uncultured | 2.7089297 | 0.2404232 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1071 | D_2__Anaerolineae | D_5__uncultured | 5.5837121 | 0.4955662 | 11.2673382 | 0.0000962 | 0.0002380 |
| Asv1749 | D_2__Anaerolineae | D_5__uncultured | 8.4584946 | 0.7507092 | 11.2673382 | 0.0000962 | 0.0002380 |
Added a few sentences to the abstract regarding results for the correlation of selected genus abundance, oxygen levels and hydrogen sulfide concentrations.
Included a few key statements to highlight the aim of this study.
Not in charge.
Not in charge.
Analysis of DNA and RNA sequences obtained from Saanich Inlet using high-throughput sequencing has allowed us to examine the nitrogen cycle in this area. This study serves as a great model to examine microbial community responses to changes in the environment such as environmental oxygen levels. Although various genes encode enzymes that are crucial to the nitrogen cycle, our study focuses on the abundance of the norC gene, which encodes the nitric oxide reductase subunit C, at various depths. Using the TreeSAPP pipeline it was shown that at all but one depth, norC was more abundant at the genome level than at the expression level. Any classified families that contained norC in their genome were also found to express it, although the ratios between DNA and RNA abundance differed between the families. Gammaproteobacteria contributed the most to norC expression levels at all but one depth, and represented the class with the most norC at the genome level as well. A notable exception for expression levels were Epsilonproteobacteria, which were found to express the most norC at depth 200m. NorC was found to correlate significantly with only one nitrogen species, NH4+, with which it had a positive correlation. Relations with all the other nitrogen species were negative and insignificant. Although our study focused on one specific gene, the methods could be used to examine the abundance of other genes either involved in the nitrogen cycle or other biogeochemical processes, as well as to gain insight into how microbial communities respond to changes in their surrounding environment.
Saanich Inlet is a seasonally anoxic fjord located between Vancouver Island and the Saanich Peninsula [1]. It is 24 km long and has a basin of up to 234 meters in depth [2]. It has a 75-meter sill which acts to protect the deeper waters [3]. Because of this sill and the constantly high input of organic material from freshwater discharge and primary production in surface waters, its conditions below 110 meters are anoxic [3]. Oxygen replenishment is dependent on the season, occurring mostly in the fall, which modifies the oxygen gradient and thereby the environmental conditions for the microbial community that inhabits the inlet [3]. Dissolved oxygen increases gradually from a minimum concentration at greater depths up to its peak concentration at the surface due to phytoplankton metabolism and atmospheric surface waters gas exchange [3]. Nitrate reduction by denitrifiers happens mostly in the deep water following oxygenation [3]. This results in a steep nitrate gradient when looking at the different depths within the fjord [3]. A study by Zaikova et al. found that microbial diversity was highest in the hypoxic transition area and that it decreases within the anoxic basin waters [1]. It is essential to study the roles of various microorganisms within Saanich Inlet in order to understand how they affect environmental conditions like greenhouse gases, methane, and denitrogenation on a larger scale in the world’s oceans [3].
Oxygen minimum zones (OMZ) are areas of the ocean typically occurring between depths of about 200 to 1000 meters where oxygen concentrations are at their lowest [4]. Global OMZs are normally found along the western boundaries of continents where upwelling brings nutrient rich waters from the deep ocean to surface waters. This influx of nutrients increases primary production and therefore the input of organic particle and respiration rates at depth [5]. OMZs are also found in coastal basins where restricted circulation decreases the mixing of deep and surface waters [5]. Saanich Inlet is a glacially carved fjord and is an example of a coastal basin in which the water circulation is decreased by an entrance sill. This in turn restricts oxygen rich water from entering the basin and creates a seasonally anoxic zone at depth [5].
A reaction performed by microbes through multispecies microbial interaction is the conversion of the inert elemental nitrogen gas (N2) to a usable form that can be used by plants for nucleic acid and protein synthesis [6]. The reductive and irreversible process of converting elemental N2 to NH4+ is catalyzed by nitrogenase - a conserved enzyme complex that is inhibited in the presence of oxygen [6]. NH4+ can be oxidized to nitrate (NO3−) in the presence of oxygen through a two-step process. The first step is the oxidation of ammonia (NH4+) to nitrite (NO2−) by a particular group of Bacteria or Archaea and the second step involves oxidation of NO2− to NO3− by a different group of nitrifying microbes [6]. Finally, opportunistic microbes use NO2− and NO3− as electron acceptors in the absence of oxygen for the oxidation of organic matter. This process ultimately leads to the formation of N2 and completion of the nitrogen cycle [6]. The latter process is called denitrification and consist of the dissimilatory reduction of NO2− and NO3− by denitrifying facultative anaerobes to nitric oxide (NO) and nitrous oxide (N2O). These two nitrogen species are classified as ionic nitrogen oxides and act as terminal electron acceptors in anaerobic conditions where they are reduced to dinitrogen (N2) [7]. Four functional enzymes are involved in the process of denitrification: nitrate, nitrite, nitric oxide, and nitrous oxide reductases that are encoded by nar, nir, nor, and nis gene clusters, respectively [8].
It is important to note that denitrification increases in oxygen minimum zones (OMZs) in the ocean. The lack of oxygen results in heterotrophic microbes using other oxidants including nitrate and nitrite. This leads to loss of nitrogen from the ocean in the form of N2 [9]. Denitrification is an undesirable process for soil fertility and agricultural productivity as it results in loss of fertilizer nitrogen (nitrate) from soil environments [7]. However, it plays a major role in removal of nitrogen from waste, such as animal residues. Denitrification is of major ecological importance since it is responsible for the supply of NO2− to the atmosphere and causes stratospheric reactions leading to the depletion of ozone [7]. The absence of this process would result in the accumulation of N2 in the atmosphere and NO3−, a toxin if found at high concentrations, in water and soil environments. In conclusion, the absence of denitrification would result in disruption of the nitrogen cycle [7].
The microbial species involved in denitrification are able to use nitrogen oxides as electron acceptors in place of oxygen under anaerobic conditions. One group of denitrifying bacteria is photosynthetic; however, most are heterotrophs and some are autotrophs that utilize reduced sulphur compounds or H2 and CO2 [7]. Important denitrifiers isolated from soil and aquatic environments include members of the Achromobacter, Alcaligenes, Bacillus, Chromobacterium, Chromobacterium, Halobacterium, Hyphomicribium, Moraxella, Paracoccus, and Pseudomonas genera [7]. Altogether, the nitrogen cycle could be more accurately represented as a complex metabolic network [10], and the Saanich Inlet is a model ecosystem for studying how the components of that network are distributed across both taxa and depth [3]. We focused our investigations on norC, which is the component of the denitrification pathway responsible for the reduction of NO to N2O.
Water samples were collected from 16 depths (10-200m) at station S3 in Saanich Inlet (48°35.500 N, 123°30.300 W) onboard MSV John Strickland during cruise 72 (August 1, 2012). Water samples (2L) were filtered through a 0.22 µm Sterivex filter to collect biomass, which was stored at -80˚C for further multi-omic analyses. Geochemical data was measured in situ using a Sea-Bird SBE 43 and in the laboratory on water samples collected using various assays [3, 11].
Total genomic DNA and RNA was extracted from Sterivex filters for 7 depths (10, 100, 120, 135, 150, 165, 200m). Illumina metagenomic shotgun libraries were constructed from reversed transcribed genomic RNA (cDNA) and genomic DNA, and were paired end sequenced (2x150bp technology) on the Illumina HiSeq platform at the US Department of Energy Joint Genome Institute (DOE JGI). Processing and quality control of the output reads were also done at JGI using the IMG/M pipeline, and the assembly and processing of the metagenomes were done at the University of British Columbia using MetaPathways 2.5 [11].
In-depth sampling and sequencing methods can be found here: Hawley AK et al 2017, “A compendium of multi-omic sequence information from the Saanich Inlet water column” Sci Data 4: 170160.
Tree-based Sensitive and Accurate Protein Profiler (TreeSAPP) was used through Google Cloud services to reconstruct the nitrogen cycle in Saanich Inlet along defined redox gradients. Computational servers were linked to Google Cloud storage using the gcsfuse command. To allow for continuous processing of the data while offline the ‘nohup’ command was used. DNA and RNA assemblies were processed for norC (D0502) through the TreeSAPP pipeline and SAM files were immediately removed using the following code:
# Code for NorC processing of the RNA and DNA samples
rm -rf treesapp_out
for 10m, 100m, 120m, 135m, 150m, 165m and 200m depths:
time treesapp.py -T 8 --verbose --delete -t D0502 --rpkm \
-i bucket/MetaG_assemblies/SI072_LV_"$depth"_DNA.scaffolds.fasta \
-r bucket/MetaG_reads/SI072_LV_"$depth".anqdp.fastq.gz \
-o treesapp_out/dna/"$depth"
rm treesapp_out/dna/"$depth"/RPKM_outputs/*.sam
time treesapp.py -T 8 --verbose --delete -t D0502 --rpkm \
-i bucket/MetaG_assemblies/SI072_LV_"$depth"_DNA.scaffolds.fasta \
-r bucket/MetaT_reads/SI072_LV_"$depth".qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp_out/rna/"$depth"
rm treesapp_out/rna/"$depth"/RPKM_outputs/*.sam
TreeSAPP output data found under treesapp_out/iTOL_output/ were visualized in a phylogenetic tree using Interactive Tree of Life (iTOL) software 4.2 v1.0 [12].
Analysis was completed in R v3.4.3 [13] using the following packages.
library(tidyverse)
library(cowplot)
library(phyloseq)
library(grid)
library(knitr)
Taxonomy, abundance, and query data were loaded from “marker_contig_map.tsv” files obtained from TreeSAPP.
treesapp.out.dir = "./treesapp_out_gene_tax_abd/"
# Example data loading
norC.DNA.10m = read_tsv(paste(treesapp.out.dir, "dna/10m/final_outputs/marker_contig_map.tsv", sep="")) %>%
select(Tax.DNA.10 = Confident_Taxonomy, Abund.DNA.10 = Abundance, Query)
The data were then combined into a single data frame.
# Combine the data frames
norC.all = norC.DNA.10m %>%
full_join(norC.DNA.100m, by = "Query") %>%
full_join(norC.DNA.120m, by = "Query") %>%
full_join(norC.DNA.135m, by = "Query") %>%
full_join(norC.DNA.150m, by = "Query") %>%
full_join(norC.DNA.165m, by = "Query") %>%
full_join(norC.DNA.200m, by = "Query") %>%
full_join(norC.RNA.10m, by = "Query") %>%
full_join(norC.RNA.100m, by = "Query") %>%
full_join(norC.RNA.120m, by = "Query") %>%
full_join(norC.RNA.135m, by = "Query") %>%
full_join(norC.RNA.150m, by = "Query") %>%
full_join(norC.RNA.165m, by = "Query") %>%
full_join(norC.RNA.200m, by = "Query") %>%
# Create a taxonomy variable aggregating all taxonomy columns to fill in any NAs
mutate(Taxonomy = ifelse(!is.na(Tax.DNA.10), Tax.DNA.10,
ifelse(!is.na(Tax.DNA.100), Tax.DNA.100,
ifelse(!is.na(Tax.DNA.120), Tax.DNA.120,
ifelse(!is.na(Tax.DNA.135), Tax.DNA.135,
ifelse(!is.na(Tax.DNA.150), Tax.DNA.150,
ifelse(!is.na(Tax.DNA.165), Tax.DNA.165,
ifelse(!is.na(Tax.DNA.200), Tax.DNA.200,
ifelse(!is.na(Tax.RNA.10), Tax.RNA.10,
ifelse(!is.na(Tax.RNA.100), Tax.RNA.100,
ifelse(!is.na(Tax.RNA.120), Tax.RNA.120,
ifelse(!is.na(Tax.RNA.135), Tax.RNA.135,
ifelse(!is.na(Tax.RNA.150), Tax.RNA.150,
ifelse(!is.na(Tax.RNA.165), Tax.RNA.165,
ifelse(!is.na(Tax.RNA.200), Tax.RNA.200,
"unclassified"))))))))))))))) %>%
# Remove old Tax variables
select(-starts_with("Tax.")) %>%
# Gather all the abundance data into 1 column
gather("Key", "Abundance", starts_with("Abund")) %>%
# Turn the Key into Depth and RNA/DNA variables
separate(Key, c("Key","Type","Depth_m"), by = ".") %>%
# Remove Key variable and reorder the columns with Query at the end
select(Depth_m, Type, Abundance, Taxonomy, Query) %>%
# Make depth numerical
mutate(Depth_m = as.numeric(Depth_m)) %>%
# Separate Taxonomy into columns
separate(Taxonomy, into = c("Domain", "Phylum", "Class", "Order", "Family", "Genus", "Species"), sep="; ")
Note that not all queries could be classified at the species level, so those cells were filled in with NA.
Geochemical data from cruise 72 (Aug 1, 2012) were loaded in order to draw correlations with norC abundance.
load("mothur_phyloseq.RData")
metadata = data.frame(mothur@sam_data)
The variation in the abundance of norC DNA and RNA with depth was examined by plotting the abundances at the different depths. Both abundances were then stratified at the family and class level as well. The presence of norC RNA was then examined in relation to N2O, NH4+, NO2-, NO3-, and O2 at the different depths. The abundances were also correlated with nitrogen species N2O, NH4+, NO2-, NO3- and O2 using a linear model using the lm() function in the stats R packages as per code found in the results section [14].
# Save gene abundance data
gene_abd_depth = norC.all %>%
# Group data by depth and type (RNA/DNA)
group_by(Depth_m, Type) %>%
# Filter out NAs
filter(!is.na(Abundance)) %>%
# Sum up the abundance data for each depth for RNA and DNA
summarize(Abundance = sum(Abundance)) %>%
# Round the sums
mutate(Abundance = round(Abundance, digits = 2))
# Plot the abundance data
ggplot(gene_abd_depth, aes(x = Type, y = Depth_m)) +
geom_point(aes(size = Abundance)) +
geom_text(aes(label = Abundance, hjust = -0.5)) +
scale_y_reverse(lim=c(200,10)) +
labs(x = NULL) +
theme(legend.position="none")
Figure 1 RPKM abundance (value labels) of norC at the genome and expression level across different depths for samples obtained at Saanich Inlet.
The abundance of norC differs with depth (Figure 1). No norC was found at either the genome or expression level at 10m. For DNA the abundance is increasing with depth until 150m, where it is maximal. Then the abundance abruptly drops at 165m, and then is relatively high again at 200m. For RNA, norC abundance is relatively low up to depth 135m, only slowly increasing. Then the abundance is notably greater at 150m, and maximum RNA abundance is reached at 165m. Then RNA abundance decreases again at 200m.
The graph shows that the abundance of DNA differs from the abundance of RNA. DNA abundance is notably higher at depths up to depth 150m. DNA abundance is maximal at 150m, where RNA abundance is lower, and RNA abundance is maximal at 165m, where DNA abundance is quite low. We see about the same amount of norC RNA and DNA at 200m.
norC.all %>%
# Change NAs to "unclassified"
mutate(Family = ifelse(is.na(Family), "unclassified", Family)) %>%
# Remove 0 abundance
mutate(Abundance = ifelse(Abundance == 0, NA, Abundance)) %>%
# Plot the data
ggplot(aes(x = Family, y = Depth_m)) +
# Keep points from overlapping
geom_point(aes(size = Abundance, color = Type), position = position_dodge(0.6)) +
scale_y_reverse(lim=c(200,10)) +
labs(x = "Family") +
theme(axis.text.x = element_text(angle = 20, hjust = 1), plot.margin= margin(0,0,0,20))
Figure 2 RPKM abundance of norC stratified at the family level. Abundance is shown at both the genome and expression level across different depths. The size of each circle is proportional to abundance.
In total seven fully classified families were found to contribute to the norC DNA and RNA levels, along with some unclassified Thiotrichales and fully unclassified organisms (Figure 2). All classified families that contribute some amount to the DNA levels, were found to also contribute a nonzero amount to the RNA levels.
A different number of classified families was found to contribute to the DNA and RNA levels at different depths. The smallest number of classified families, only three, contribute at depths of 100m, 120m, and 165m, while the biggest number of classified families, six, contribute at a depth of 150m. Some families were found to contribute to norC levels only at one depth, while others contribute at more depths.
At every depth, the family Candidatus Thioglobus was found to contribute the most to norC DNA abundance, while the second biggest contributors were unclassified organisms. The biggest contributors to RNA abundance at the majority of depths were found to be unclassified organisms. This is while at depth 200m Helicobacteraceae contribute the most to RNA abundance.
norC.all %>%
# Change NAs to "unclassified"
mutate(Class = ifelse(is.na(Class), "unclassified", Class)) %>%
# Remove 0 abundance
mutate(Abundance = ifelse(Abundance == 0, NA, Abundance)) %>%
ggplot(aes(x = Class, y = Depth_m)) +
# Keep points from overlapping
geom_point(aes(size = Abundance, color = Type), position = position_dodge(0.6)) +
scale_y_reverse(lim=c(200,10)) +
labs(x = "Class") +
theme(axis.text.x = element_text(angle = 20, hjust = 1))
Figure 3 RPKM abundance of norC stratified at the class level. Abundance is shown at both the genome and expression level across different depths. The size of each circle is proportional to abundance.
Viewed at the class level, it can be seen that the unclassified organisms responsible for most of the RNA are of the class Gammaproteobacteria (Figure 3).
These patterns can also be seen by visualizing the data on iTOL (sample in Appendix Figure A1). The expression of norC is observed to be localized to a few classes. Classes with the highest levels of norC expression are variable with respect to depth. Epsilonproteobacteria and Ardenticatenia were observed to have greater expression at 135m or shallower, while Gammaproteobacteria expression is more biased towards greater depths.
# Save plot of abundance data
plot_abd_depth = ggplot(gene_abd_depth, aes(x = Type, y = Depth_m)) +
geom_point(aes(size = Abundance)) +
scale_y_reverse(lim=c(200,10)) +
labs(x = "", y = NULL) +
theme(plot.margin = unit(c(0,0,0.55,0), "cm"))
# Order the data by depth
metadata_depth = metadata %>% arrange(Depth_m)
# Example plot code
# Save plot for NO3 metadata as it changes with depth
plot_NO3 = metadata_depth %>%
ggplot(aes(x = NO3_uM, y = Depth_m)) +
geom_point() +
geom_path(aes(group = 1)) +
scale_y_reverse(lim=c(200,10)) +
labs(y = "Depth (m)", x = "NO3 (uM)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
plot.margin = unit(c(0,0.1,0.3,0), "cm"))
# Create composite figure
plot_grid(plot_N2O, plot_NH4, plot_NO2, plot_NO3, plot_O2, plot_abd_depth,
labels=c("A", "B", "C", "D", "E", "F"),
nrow = 1,
rel_widths=c(1.2,1,1,1,1,2.4))
Figure 4 Nitrogen species and oxygen concentrations across depth (A-E) compared to norC abundance at both the genome and expression level (F; reproduced from Figure 1)
The relation between norC abundance and nitrogen species and oxygen was examined (Figure 4). NO3- and N2O concentrations are seen to peak at 100m, where some norC DNA is observed, but not much norC RNA. NO3- and N2O then decrease with greater depth, as DNA and RNA levels increase. NO2- is relatively constant, between about 0.05uM to 0.10uM, at depths 100m to 165m, where norC DNA and RNA levels increase. This is while NH4+ increases in concentration with depth, starting at a depth of 150m, seemingly increasing with RNA levels. There was lower oxygen concentration where there was a higher RNA abundance. These possible relationships were further examined with linear models.
# Combine gene abundance data with metadata
gene_abd_depth_meta = metadata %>%
# Change N2O to also be in uM like the others
mutate(N2O_uM = N2O_nM / 1000) %>%
# Select columns of interest
select(Depth_m, NH4_uM, N2O_uM, NO2_uM, NO3_uM, O2_uM) %>%
# Rename the columns
rename("NH4 (uM)" = NH4_uM) %>%
rename("N2O (uM)" = N2O_uM) %>%
rename("NO2 (uM)" = NO2_uM) %>%
rename("NO3 (uM)" = NO3_uM) %>%
rename("O2 (uM)" = O2_uM) %>%
# Join the abundance data and metadata
gather(Nutrient,uM,-Depth_m) %>%
full_join(gene_abd_depth, by = "Depth_m") %>%
unite(Type_Nutrient, Type, Nutrient, sep = " - ") %>%
arrange(desc(Type_Nutrient))
# Plot the data
ggplot(gene_abd_depth_meta, aes(x = uM, y=Abundance)) +
geom_point() +
facet_wrap(~Type_Nutrient, scales="free", ncol = 5) +
geom_smooth(method='lm') +
labs(y = "Abundance", x = "Concentration (uM)") +
theme(text = element_text(size=18))
Figure 5 RPKM abundance of norC at different nitrogen species concentrations. Abundance is shown at both the genome and expression level.
The presence and expression of the norC gene was only positively associated with NH4+ levels, while all other correlations revealed negative trends.
# Initialize a data frame
RNA_abundance = data.frame(matrix(ncol = 5, nrow= 0))
for (type_nutrient in unique(gene_abd_depth_meta$Type_Nutrient)) {
# Get the summary of lm()
summ = gene_abd_depth_meta %>%
filter(Type_Nutrient == type_nutrient) %>%
lm(Abundance ~ uM, .) %>%
summary()
# Pull out the statistics values
coef = summ$coefficients["uM",]
# Convert into a data frame
coef_data_frame = data.frame(coef=matrix(coef),row.names=names(coef))
# Add to data frame for table
RNA_abundance[nrow(RNA_abundance) + 1,] <-c(type_nutrient, round(coef_data_frame[,1], 3))
}
# Reverse order to agree with graphs
RNA_abundance = RNA_abundance[nrow(RNA_abundance):1,]
# Name the columns
colnames(RNA_abundance) <- (c("Correlation", "Estimate", "Std. Error","t value","Pr(>|t|) (p-value)"))
# Make a table for the data
kable(RNA_abundance,row.names=FALSE,caption="Table 1 Correlation of *norC* Abundance across Nitrogen Species and Oxygen Concentrations.")
| Correlation | Estimate | Std. Error | t value | Pr(>|t|) (p-value) |
|---|---|---|---|---|
| DNA - N2O (uM) | -606.402 | 9418.328 | -0.064 | 0.951 |
| DNA - NH4 (uM) | 25.255 | 21.004 | 1.202 | 0.283 |
| DNA - NO2 (uM) | -2031.095 | 1383.87 | -1.468 | 0.202 |
| DNA - NO3 (uM) | -2.491 | 6.099 | -0.408 | 0.7 |
| DNA - O2 (uM) | -1.318 | 0.644 | -2.046 | 0.096 |
| RNA - N2O (uM) | -11946.518 | 9699.499 | -1.232 | 0.273 |
| RNA - NH4 (uM) | 56.903 | 11.746 | 4.844 | 0.005 |
| RNA - NO2 (uM) | -2268.704 | 1659.945 | -1.367 | 0.23 |
| RNA - NO3 (uM) | -11.349 | 5.228 | -2.171 | 0.082 |
| RNA - O2 (uM) | -1.27 | 0.855 | -1.485 | 0.198 |
NorC expression was found to significantly positively correlate with NH4+ levels (p = 0.005). No other significant correlations were observed at either the genome or the expression level.
Expression level significances (ranging p = 0.005-0.273) were generally higher than those of the genome level (ranging p = 0.202-0.951).
While we only investigated a single component of the denitrification pathway, norC’s results can be related to the complex nitrogen metabolic network. At the shallow depth of 10m, norC gene copies and transcripts are not found at detectable levels (Figure 1), which suggests that the complete denitrification pathway is not active. There could be multiple reasons why the denitrification pathway is not active in surface waters. For one, there is ample oxygen and sunlight present for phototrophs to produce energy without requiring the less-efficient denitrification pathway [15, 16]. Second, nitrate is depleted at this depth (Figure 4) and any free nitrogen is likely required for primary production and assimilatory mechanisms to support amino acid and nucleotide synthesis [17].
Aerobic denitrification has been found in harsh environments where the oxygen concentration fluctuates. It is the main source of atmospheric N2O which has been rising due to the uncontrolled use of fertilizer in agriculture and the intensification of OMZs globally [18]. The trends in NO3−, NO2- and NH4+ levels across depths in Saanich Inlet (Figure 4) are consistent with literature in which heterotrophic microbes were found to utilize other oxidants, such as NO3- and NO2-, in anoxic environments [19]. This shift in metabolic activity from oxic to anoxic waters results in NO3- and NO2- being consumed and depleted in anoxic environments as well as NH4+ being produced. While denitrifying microbes produce NO and N2O, other heterotrophic microbes in anoxic environments reduce NO3- and NO2- to NH4+ , consistent with the observation of high NH4+ concentrations in the anoxic layer (Figure 4) [20]. The relationship between NH4+ levels and norC expression is shown to be positive and significant (Table 1).
Interestingly, the concentration of N2O peaks at 100m and gradually decreases, reaching a minimum at 200m (Figure 4). This could be seen as paradoxical since the abundance of norC gene copies and transcripts is increasing with depth and they are responsible for the production of N2O [20], so the concentration of N2O would be expected to increase with depth. However, N2O production is dependent on the presence of other genes involved in the denitrification pathway such as nar, nir, nor, and nis gene clusters [10]. Additionally, N2O production is also dependent on NO3- levels which follow a trend similar to N2O concentrations (Figure 4). Thus, it is tempting to argue that the highest N2O concentrations are found at ~100m because of the dependency of the denitrification pathway on NO3- as a substrate. However, NO3- should not be present where higher rates of denitrification are present since it is consumed by this process and denitrification rates have been shown to be greater at depths where oxygen is depleted [21]. Studies have also shown that N2O is consumed by microbes during the last step in canonical denitrification, which occurs in strictly anoxic environment [22]. This suggests that the the low N2O levels within the anoxic layer do not represent a low rate of N2O production, but instead the production of N2O and its subsequent consumption by the microbial community. This last step of the canonical denitrification pathway in which N2O is being consumed is done through the nitrous oxide reductase gene (nosZ) and therefore it would be interesting to investigate how active this gene is at depth in the inlet.
Many of the norC-containing species we identified are known to be important in the nitrogen cycle. Among the identified classes, Gammaproteobacteria contributed the most to norC DNA and RNA abundance across the majority of depths (Figure 3). This comes as no surprise because Gammaproteobacteria are known to be common marine denitrifiers [23, 24]. Now focusing our analysis at the family taxonomic level, we found that Candidatus Thioglobus contributed the most to norC DNA abundance (Figure 2), which is consistent with literature as the members of that family have been found to produce nitrate, while they are also abundant in OMZs [25]. Interestingly, the family Helicobacteraceae has been found to co-occur with denitrification on elemental sulfur particles [26], which our results support because we only observed high norC expression in Helicobacteraceae at a depth of 200m where sulphur is expected to be highest (Figure 2) [3]. Elucidating which clades contribute to the denitrification pathway by expressing norC is important because it completes one piece of the puzzle that is the nitrogen distributed metabolic network.
Distributed metabolism as demonstrated by the nitrogen cycle is beneficial for the overall survival of the pathway. For example, nitrogenase genes are spread out among different microbes in aerobic and anaerobic environments [27]. This broad distribution of nitrogen-fixing genes reflects the different environments these microbes inhabit [27]. Since the nitrogen cycle is vital for the survival of many plants and animals, a greater distribution of important genes and components for the cycle increases the likelihood of gene persistence through environmental adaptation. Without norC, the denitrification pathway would be incomplete, impacting the entire nitrogen cycle. Denitrification is important for converting nitrate to nitrogen gas by removing bioavailable nitrogen and returning it to the atmosphere [27]. Without this step, nitrogen would remain fixed as the nitrate state in the ecosystem [27]. In terrestrial ecosystems, adding nitrogen can cause nutrient imbalance in trees and change the forest health, eventually leading to a decline in biodiversity [27]. With recent developments in agriculture, much of the nitrogen used in urban and agricultural activities eventually leaches out of the soil and into bodies of water like rivers and streams [27]. In nearshore marine systems, excess nitrogen can lead to eutrophication [27]. To avoid such catastrophic changes to the environment, having vital nitrogen genes distributed among different microbial species prevents accumulation of nitrogen in the anoxic zones of lakes and oceans. Species that cannot adapt to extreme environmental pressures, whether occurring naturally or through human industrial alterations, will not cause the breakdown of the nitrogen cycle. Other microbes that have the capacity to survive harsh environments and have the corresponding genes, will take their place within the denitrification step, ensuring a seamless continuation of the major transformations within the cycle.
When we consider the evolutionary benefits for having a distributed metabolism among different microbes, one potential benefit may be that by doing so, microbes can make their genomes smaller. Having a smaller genome can provide a selective advantage. Giovannoni studied how B vitamins are distributed in complex patterns [28]. An important point that he brings up is that survival of the different plankton species all depends on vitamins that are made within the community - “a cell that needs a vitamin can only thrive in a community that provides it” [28]. Also, connectedness has become an important concept because it can help explain the response of biological communities to stresses [28]. Giovannoni observes that in fluctuating ocean conditions, the fluxes of reduced sulfur and glycine are sufficient to meet a certain species’ requirement most of the time [28]. By relying on the natural sulfur and glycine, the microbes no longer need the genes to reduce the sulfur or glycine themselves, thereby reducing the energy expended by the microbes. This suggests that the fitness advantages of smaller genomes and lower cellular costs offset the potential gain in fitness that would come from producing the required compounds when they are in short supply in the environment [28]. This is not a direct correlation of the nitrogen cycle, but a concept can be extracted from this example. By distributing genes for the nitrogen cycle, with smaller genomes microbes have a selective advantage for survival.
Future studies should examine the potential presence of metabolic interactions between genera expressing norC gene. These possible interactions may impact and alter the abundance and the role of these genera within the community at a specific depth. Although our study focused on one specific gene, the methods could be used to examine the abundance of other genes to further gain insight regarding other biogeochemical processes through microbial metabolism. So one could further examine the genomes of the organisms present in the sample to determine whether there are any other distributed metabolisms among the species. Perhaps other pathways would be discovered, where the bacteria are interdependent on one another, each providing substrates for the other.
[1] Zaikova E., Walsh DA, Stilwell CP, Mohn WW, Tortell PD, Hallam SJ. 2010. Microbial community dynamics in a seasonally anoxic fjord: Saanich Inlet, British Columbia. Environmental Microbiology 12:172-191.
[2] Herlinveaux RH. 2011. Journal of the Fisheries Research Board of Canada 19: 1-37.
[3] Torres-Beltrán M, Hawley AK, Capelle D, Zaikova E, Walsh DA, Mueller A, Scofield M, Payne C, Pakhomova L, Kheirandish S, Finke J, Bhatia M, Shevchuk O, Gies EA, Fairley D, Michiels C, Suttle CA, Whitney F, Crowe SA, Tortell PD, Hallam SJ. 2017. A compendium of geochemical information from the Saanich Inlet water column. Sci Data 4:170159.
[4] Oxygen Minimum Zones. Keil Lab Aquatic Organic Geochemistry UW Oceanography. http://depts.washington.edu/aog/oxygen-minimum-zones/
[5] OMZ Microbes - A SCOR working group. About OMZs. http://omz.microbiology.ubc.ca/page2/index.html
[6] Falkowski PG, Fenchel T, Delong EF. 2008. The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science 320:1034-1039.
[7] Knowles, R. 1982. Denitrification. Microbiological Reviews 46:43-70.
[8] Bergaust L, Spanning RJM, Frostegard A, Bakken LR. 2010. Expression of nitrous oxide reductase in Paracoccus denitfiricans is regulated by oxygen and nitric oxide through FnrP and NNR. Microbiology 158:826-834.
[9] Altabet MA, Ryabenko E, Stramma L, Wallace DWR, Frank M, Grasse P, Lavik G. 2012. An eddy-stimulated hotspot for fixed nitrogen-loss from the Peru oxygen minimum zone. Biogeosciences 9:4897-4908.
[10] Simon J, Klotz MG. 2013. Diversity and evolution of bioenergetic systems involved in microbial nitrogen compound transformations. Biochimica et Biophysica Acta (BBA) - Bioenergetics 1827:114–135.
[11] Hawley AK, Torres-Beltrán M, Zaikova E, Walsh DA, Mueller A, Scofield M, Kheirandish S, Payne C, Pakhomova L, Bhatia M, Shevchuk O, Gies EA, Fairley D, Malfatti SA, Norbeck AD, Brewer HM, Pasa-Tolic L, Rio TGD, Suttle CA, Tringe S, Hallam SJ. 2017. A compendium of multi-omic sequence information from the Saanich Inlet water column. Sci Data 4:170160.
[12] Letunic I., Bork P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics. 2007;23:127–128.
[13] R Core Team. 2017. R: A language and environment for statistical computing. 3.4.3. R Foundation for Statistical Computing, Vienna, Austria.
[14] Stats. (n.d.). R Documentation. Retrieved April 04, 2018, from https://www.rdocumentation.org/packages/stats/versions/3.4.3/topics/lm
[15] Zehr JP, Kudela RM. 2009. Photosynthesis in the Open Ocean. Science 326:945–946.
[16] Koike I, Hattori A. 1975. Energy Yield of Denitrification: An Estimate from Growth Yield in Continuous Cultures of Pseudomonas denitrificans under Nitrate-, Nitrite- and Nitrous Oxide-limited Conditions. Journal of General Microbiology 88:11–19.
[17] Wawrik B, Boling WB, Nostrand JDV, Xie J, Zhou J, Bronk DA. 2011. Assimilatory nitrate utilization by bacteria on the West Florida Shelf as determined by stable isotope probing and functional microarray analysis. FEMS Microbiology Ecology 79:400–411.
[18] Ji B, Yang K, Zhu L, Jiang Y, Wang H, Zhou J, Zhang H. 2015. Aerobic denitrification: A review of important advances of the last 30 years. Biotechnology and Bioprocess Engineering 20:643–651.
[19] Altabet MA, Ryabenko E, Stramma L, Wallace DWR, Frank M, Grasse P, Lavik G. 2012. An eddy-stimulated hotspot for fixed nitrogen-loss from the Peru oxygen minimum zone. Biogeosciences 9:4897-4908.
[20] Martínez-Espinosa RM, Cole JA, Richardson DJ, Watmough NJ. 2011. Enzymology and ecology of the nitrogen cycle. Biochemical Society Transactions 39:175–178.
[21] Bergaust L, Spanning RJM, Frostegard A, Bakken LR. 2010. Expression of nitrous oxide reductase in Paracoccus denitfiricans is regulated by oxygen and nitric oxide through FnrP and NNR. Microbiology 158:826-834.
[22] Sun X, Jayakumar A, Ward BB. 2017. Community Composition of Nitrous Oxide Consuming Bacteria in the Oxygen Minimum Zone of the Eastern Tropical South Pacific. Frontiers in Microbiology 8.
[23] Brettar I, Moore E, Höfle M. 2001. Phylogeny and Abundance of Novel Denitrifying Bacteria Isolated from the Water Column of the Central Baltic Sea. Microbial Ecology 42:295–305.
[24] Franco DC, Signori CN, Duarte RTD, Nakayama CR, Campos LS, Pellizari VH. 2017. High Prevalence of Gammaproteobacteria in the Sediments of Admiralty Bay and North Bransfield Basin, Northwestern Antarctic Peninsula. Frontiers in Microbiology 08.
[25] Shah V, Chang BX, Morris RM. 2016. Cultivation of a chemoautotroph from the SUP05 clade of marine bacteria that produces nitrite and consumes ammonium. The ISME Journal 11:263–271.
[26] Kostrytsia A, Papirio S, Frunzo L, Mattei MR, Porca E, Collins G, Lens PN, Esposito G. 2018. Elemental sulfur-based autotrophic denitrification and denitritation: microbially catalyzed sulfur hydrolysis and nitrogen conversions. Journal of Environmental Management 211:313–322.
[27] Bernhard, A. 2010. The Nitrogen Cycle: Processes, Players, and Human Impact. Nature Education Knowledge 3:25.
[28] Giovannoni SJ. 2012. Vitamins in the sea. Proceedings of the National Academy of Sciences 109:13888–13889.
Figure A1 FPKM RNA abundance (coloured bars from purple to red) of norC expression across sample depth with respect to taxonomic assignments. Selected clades are numbered (1: Gammaproteobacteria; 2: Epsilonproteobacteria; 3: Ardenticatenia; 4: Clostridia; 5: Negativicutes). FPKM bar heights are relative within the same colour only. NorC expression is not observed in species outside of what is shown.
Not in charge.
Not in charge.
Not in charge.
Not in charge.